Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation

Alrasheed, Ghady; Alsuhibany, Suliman A.

doi:10.3390/app15062972

Open AccessArticle

Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation

by

Ghady Alrasheed

^*

and

Suliman A. Alsuhibany

Department of Computer Science, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 2972; https://doi.org/10.3390/app15062972

Submission received: 4 February 2025 / Revised: 3 March 2025 / Accepted: 4 March 2025 / Published: 10 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

With the increasing online activity of Arabic speakers, the development of effective CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart) tailored for Arabic users has become crucial. Traditional CAPTCHAs, however, are increasingly vulnerable to machine learning-based attacks. To address this challenge, we introduce a method for generating adversarial handwritten Arabic CAPTCHAs that remain user-friendly yet difficult for machines to solve. Our approach involves synthesizing handwritten Arabic words using a simulation technique, followed by the application of five adversarial perturbation techniques: Expectation Over Transformation (EOT), Scaled Gaussian Translation with Channel Shifts (SGTCS), Jacobian-based Saliency Map Attack (JSMA), Immutable Adversarial Noise (IAN), and Connectionist Temporal Classification (CTC). Evaluation results demonstrate that JSMA provides the highest level of security, with 30% of meaningless word CAPTCHAs remaining completely unrecognized by automated systems falling to 6.66% for meaningful words. From a usability perspective, JSMA also achieves the highest accuracy rates, with 75.6% for meaningless words and 90.6% for meaningful words. Our work presents an effective strategy for enhancing the security of Arabic websites and online interfaces against bot attacks, contributing to the advancement of CAPTCHA systems.

Keywords:

cyber security; Arabic handwritten CAPTCHAs; deep learning; adversarial examples; robustness; usability

1. Introduction

CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart) are employed on websites to distinguish between human users and automated bots. Initially, standard text-based CAPTCHAs used random alphanumeric strings that were easy for humans to read but difficult for computers to interpret [1]. However, with advancements in machine learning and computer vision, many text-based CAPTCHAs can now be solved by algorithms [2]. To enhance security, more recent CAPTCHA systems incorporate computer vision tasks, such as image recognition [3].

Since handwriting recognition remains a challenge for machines [4,5], handwritten CAPTCHAs have been proposed to improve resistance against automated attacks [6]. Arabic handwriting, in particular, poses additional complexities due to its intricate character shapes, connected letters, and orientation dependence. These factors make modeling Arabic handwriting especially challenging [6]. Unfortunately, most CAPTCHAs in use today are limited to the Latin alphabet and are not well-suited for non-Latin scripts, such as Arabic [7].

Applying adversarial examples to CAPTCHAs is one approach aimed at improving the security of CAPTCHA schemes. Although adversarial CAPTCHAs have been proposed previously in [8], the study does not specifically address Arabic CAPTCHA schemes.

To bridge this gap, this paper proposes an approach for generating adversarial handwritten Arabic CAPTCHAs using deep learning techniques by simulating the complex character shapes, letter connections, and orientation dependencies characteristic of Arabic handwriting. Specifically, the focus is on creating CAPTCHA challenges that remain simple for humans to solve but pose significant difficulty for advanced machine learning models, even those with sophisticated recognition capabilities.

In our approach, five adversarial techniques are applied to perturb the generated CAPTCHAs, namely, Expectation Over Transformation (EOT), Scaled Gaussian Translation with Channel Shifts (SGTCS), Jacobian-based Saliency Map Attack (JSMA), Immutable Adversarial Noise (IAN), and Connectionist Temporal Classification (CTC).

The results demonstrated that JSMA provided the highest security level, with 30% of meaningless word CAPTCHAs completely unrecognized falling to 6.66% for meaningful words. From a usability standpoint, JSMA achieved the best accuracy, with 75.6% for meaningless words and 90.6% for meaningful words by JSMA technique. Overall, our approach offers an effective strategy for enhancing the security of Arabic websites and online interfaces against bot attacks.

The remainder of the paper is organized as follows: The literature review is shown in Section 2. Section 3 explains the methodology. The results are shown in Section 4, while Section 5 discusses them. Finally, the paper is concluded in Section 6 with some future works.

2. Literature Review

The literature review offers a comprehensive overview of past research on CAPTCHAs systems, particularly focusing on studies related to handwritten CAPTCHAs in both Latin and Arabic scripts, and then adversarial CAPTCHAs. It aims to highlight potential areas for future research and improvement by evaluating the advantages and limitations of existing approaches.

2.1. CAPTCHA Types

Online users have relied on traditional CAPTCHAs to distinguish between humans and bots for years. However, recent advancements in machine learning have introduced new security vulnerabilities, allowing bots to bypass these challenges more effectively.

According to Burstzein et al. [9], machine learning models are used to solve English text-based and numerical CAPTCHAs. Vulnerabilities were found in standard text CAPTCHAs against adversarial machine learning attacks, with the best model achieving 88% accuracy. While computer vision tasks are more secure, they can still be attacked as shown by Wang et al. [10]. Their study collected a large image dataset to train deep neural networks for segmenting and recognizing visual CAPTCHA images. Their model achieved over 79.2% to 98.6% accuracy, defeating visual CAPTCHAs. Similarly, Panda [11] employed Convolutional Neural Networks (CNNs) to recognize CAPTCHAs with high efficiency and accuracy.

Therefore, the rapid advancements in visual recognition technologies challenge the effectiveness of traditional CAPTCHA systems. This has prompted researchers to explore new techniques aimed at enhancing CAPTCHA security [12]. In general, visual reasoning CAPTCHAs have been proposed as a more advanced alternative, requiring users to identify specific items in images based on text prompts. Meanwhile, audio CAPTCHAs have shown promise in the early stages as a potential method for preventing attacks [13].

CAPTCHAs that rely on images are a promising substitute for traditional text-based CAPTCHAs. Systems like CAPTCHaStar utilize human abilities in recognizing shapes and provide better user experience and strength against automated attacks [14]. Interactive CAPTCHAs, such as iCAPTCHA, employ multi-step user interactions to thwart third-party human attacks by generating time discrepancies between genuine users and human solvers [15]. Game-based CAPTCHAs, where users must solve AI problems using drag-and-drop or click methods, are viewed as very secure but could be at risk from image processing methods and relay attacks [16]. Video CAPTCHAs require users to track moving objects among decoys, making it difficult for automated and relay attacks to succeed [17].

While CAPTCHAs have been successful in preventing automated attacks, the rise of human-assisted CAPTCHA solving poses a notable danger. As pointed out in [9], this method involves recruiting external parties to manually solve CAPTCHAs, undermining their core function of distinguishing between humans and machines. Although various approaches offer initial security benefits, the fact that existing solutions remain vulnerable highlights the ongoing need for stronger defenses. Thus, the proposed approach helps to enhance the defense level.

In this section, we have reviewed several common CAPTCHA schemes explored in prior literature. Table 1 summarizes the key CAPTCHA types that were discussed.

2.2. Latin Handwritten CAPTCHAs

Latin handwritten CAPTCHAs have received a great deal of attention in recent times [19,20]. That is, handwritten CAPTCHAs have emerged as an alternative to text-based versions, which are increasingly susceptible to automated attacks [21]. These systems leverage the human ability to recognize handwriting, a task that remains challenging for machines [22]. Previous research has integrated various language scripts into multilingual CAPTCHAs, including English, Arabic, Spanish, and French [6]. Additionally, a system was developed in [21] to synthesize English handwriting and assess its security against automated solvers, with subsequent studies, like [23], expanding this approach to include multiple languages.

Rusu and Govindaraju [19] were the first to investigate the use of handwritten CAPTCHAs, highlighting the difference in abilities between humans and machines in reading handwritten words. They also examined the difficulties presented by handwritten text images for computers and their real-world users [24]. The researchers also suggested a visual CAPTCHA with handwritten image evaluation [25] and an interactive human proof system employing handwriting identification [26].

Generating and evaluating handwritten CAPTCHAs incorporating Latin characters has seen limited exploration in previous research. Thomas et al. [22] investigated features of Latin script that could impact recognition. Achint and Venu [27] suggested a technique for creating synthetic handwritten CAPTCHAs and assessed their effectiveness. Rao and Singh [28] developed a random handwritten CAPTCHA method for enhancing web security.

In addition to their earlier research, Rusu and Govindaraju [20] studied the creation and application of handwritten CAPTCHAs, while Rusu et al. [29] examined the use of cognitive factors in enhancing web security with CAPTCHAs. Rusu and Docimo [30] explored leveraging human perception and visual object interpretation for the same purpose.

Govindaraju [21] aimed to build upon past efforts by developing a novel Latin handwritten CAPTCHA system. Techniques included synthesizing representations of connected Latin script resembling natural handwriting variability. Both automated and human evaluations were employed to assess the security and usability.

However, there has been limited research focused on generating and evaluating handwritten CAPTCHAs that incorporate Latin characters. For example, a study in [22] investigated characteristics of Latin script that could influence recognition.

2.3. Arabic Handwritten CAPTCHAs

Arabic handwritten CAPTCHA has received considerable attention in the CAPTCHA scheme literature over the past five years [31]. For example, Lajmi et al. [32] introduced a novel approach using handwritten Arabic calligraphy to enhance security, demonstrating that stylized scripts significantly reduce automated recognition while maintaining human readability.

Recent research aims to create and evaluate Arabic handwritten CAPTCHAs to improve security and usability. A visual cryptography method was developed by Alsuhibany and Alquraishi [33] for Arabic CAPTCHAs that are handwritten and printed. They encrypted images into two parts that needed to be stacked correctly for decryption. The evaluation showed that the approach provided acceptable usability and security results for Arabic users.

A novel Arabic CAPTCHA scheme based on handwritten text segmentation was proposed by Parvez and Alsuhibany [34]. They used a technique that included generating synthetic images of cursive Arabic words and asking users to validate the CAPTCHAs by finding the segmentation points between letters. According to experimental results, this method maintained security and usability while providing challenges to users.

The technique suggested by Alsuhibany and Parvez [35] includes the distortion of pre-written Arabic word images with OCR processes to generate secure Arabic handwritten CAPTCHAs. With an accuracy rate of 88–90% in usability and a less than 0.5% risk in security, this method demonstrates a successful balance between usability and security.

Abadalla et al. [36] developed a robust system capable of producing realistic Arabic handwriting datasets from ASCII text—a crucial resource for various applications including CAPTCHA development—by carefully selecting a diverse set of simple Arabic words and refining the Gaussian Mixture Model process. This was an improvement in the datasets that were available for training and evaluating Arabic CAPTCHA models.

Alsuhibany et al. [37] proposed a synthetic Arabic handwritten CAPTCHA scheme. They developed generative models using machine learning techniques to synthesize authentic-looking handwritten words. Their evaluation showed the capacity to generate authentic word pictures, achieving a human accuracy rate of over 80% in word identification.

Generative Adversarial Networks (GANs), as explored in Alkhodidi et al, [38] are an approach for generating handwritten Arabic characters. GAN models can generate more challenging and reliable CAPTCHAs by creating high-quality, diverse character samples. This reduces the risk of automated attacks, while also improving the security of online systems.

Improvements in Arabic handwritten CAPTCHAs are useful in a variety of fields, including education. For example, Parvez et al. [39] developed a gamified Arabic CAPTCHA system that improves security and promotes language acquisition. It combines CAPTCHA features with game elements such as word completion, ordering, and sentence creation in Arabic. This approach involves engaging users while preventing bot attacks. The system’s ability to improve users’ Arabic vocabulary and grammar demonstrates its educational potential.

Khan et al. [40] pioneered the use of Arabic CAPTCHAs for cybersecurity, highlighting challenges such as the script’s complexity, cultural relevance, and accessibility barriers. Their work emphasized the necessity of adapting CAPTCHA designs to Arabic’s cursive nature and contextual letterforms while addressing usability concerns for users with disabilities. Building on this foundation, recent advancements in machine learning and gamification offer opportunities to enhance both security and user-centric design, underscoring the need for tailored solutions that balance linguistic specificity with robust protection against automated attacks

2.4. Adversarial CAPTCHAs

Adversarial examples, first formalized by Goodfellow et al. [41,42] in their seminal work, are inputs intentionally perturbed to mislead machine learning models while remaining indistinguishable to humans. These perturbations exploit the high-dimensional linearity of neural networks, where small, carefully crafted changes in input space can cause significant shifts in model predictions. For CAPTCHAs, adversarial techniques introduce noise or distortions that degrade automated recognition systems’ accuracy while preserving human readability. For instance, gradient-based attacks like the Fast Gradient Sign Method (FGSM) generate perturbations by maximizing the model’s loss function, as shown in:

δ = ϵ \cdot s i g n (\nabla_{x} L (x, y))

where

δ

is the perturbation,

ϵ

controls magnitude, and

\nabla_{x} L

is the gradient of the loss with respect to the input. Such methods have been pivotal in exposing model vulnerabilities and informing defenses like adversarial training. In CAPTCHA design, adversarial perturbations exploit these vulnerabilities to create “human-readable, but machine-resistant” challenges, as demonstrated in this work. Following this groundbreaking work, researchers have explored various techniques for generating adversarial examples and evaluating their impact on different machine learning applications, including image recognition, natural language processing, and security-critical systems like CAPTCHAs [43,44,45,46]. These studies highlight the importance of developing robust machine learning models that can withstand adversarial attacks, which is a key focus of the current research. Terad et al. [43] presented an adversarial CAPTCHA that stays visible but blocks machine learning classifiers. Osadchy et al. [44] created Deep CAPTCHA, which utilizes adversarial noise to improve security. Adversarial noise is a type of noise intentionally added to a system, like a CAPTCHA, in order to increase its resistance to machine learning model attacks while maintaining its human solvability. Zhang et al. [45] employed a combination of techniques to create more secure character-based CAPTCHAs. They utilized adversarial perturbations, which involve making small changes to images to confuse machine learning models. Additionally, they employed a multi-target attack, generating multiple diverse examples to fool models. Furthermore, they used ensemble adversarial training, which involves training models on multiple adversarial examples to improve robustness. Finally, they applied image preprocessing techniques, modifying images to make them more resistant to attacks. By combining these techniques, they were able to take advantage of the differences in how humans and algorithms respond to visual distortions, ultimately creating more secure CAPTCHAs. Dinh et al. [46] combined adversarial examples and neural style transfer to enhance CAPTCHA security.

Shi et al. [8] explored both text-based and image-based adversarial CAPTCHA generation systems. In the domain of audio CAPTCHAs, Wang et al. [47] developed Generative-Adversarial Network (GAN) technology for synthetically generating noise and adversarial examples, which substantially decreased the recognition accuracy achievable by automated models. Their approach exemplifies ongoing research into more robust adversarial techniques across different CAPTCHA modalities.

Two models, D-GAN and C-GAN, were proposed for generating CAPTCHA images by incorporating distance values or combining multiple source images to enhance protection against CAPTCHA solvers [48]. Advances in creating adversarial CAPTCHAs enhance security defenses and user experience, contributing to the ongoing effort to protect online systems from malicious attacks.

Prior studies on adversarial CAPTCHAs in non-Arabic scripts provide critical context:

Latin Script: Shi et al. [8] developed adversarial text CAPTCHAs using gradient-based perturbations, achieving 85% resistance against OCR models. Osadchy et al. [44] combined adversarial noise with style transfer to enhance Latin CAPTCHA robustness.
Chinese Script: Zhang et al. [45] employed multi-target adversarial attacks on Chinese character CAPTCHAs, reducing solver accuracy to 12% while maintaining 88% human accuracy.
Cross-Script Analysis: Unlike Arabic, Latin and Chinese CAPTCHAs benefit from extensive datasets and standardized fonts. Arabic’s cursive nature, contextual letter forms (e.g., isolated, initial, medial, final), and diacritics introduce unique challenges, making adversarial perturbations more effective against segmentation and recognition models. This work bridges a critical gap by tailoring adversarial techniques to Arabic’s script-specific complexities.

3. Methodology

The aim of this research is to develop Arabic handwriting CAPTCHAs with enhanced security against machine learning by integrating adversarial perturbation techniques. This section describes the methodology of our study, as shown in Figure 1.

The methodology has three main phases:

Generation of Arabic handwritten words:
- We have used the Alsuhibany and Alquraishi [33] model in order to generate meaningless Arabic handwritten words.
- We have used the same model [33] to generate meaningful Arabic handwritten words.
Applying adversarial perturbation:
- We have applied five adversarial models on the generated samples. These techniques are: EOT, SGTCS, JSMA, IAN, and CTC. The generated CAPTCHAs use single perturbations (e.g., JSMA-only, EOT-only). The selection of these approaches was based on their empirical robustness against perturbation removal attacks demonstrated by Alsuhibany [49]. Using five approaches provided sufficient data while keeping the scope manageable for the study.
Evaluation
- We have evaluated the security level of the generated samples via the Google vision API [50].
- We have evaluated the usability level of the generated samples through an experimental study.

In the following sections, we will explain the main phases of the methodology in detail.

3.1. Arabic Handwritten CAPTCHA

Alsuhibany and Alquraishi’s [33] model has been improved by adding handwritten meaningful words to the meaningless words generated by their model.

The details of these models are provided below.

Generation of Arabic Handwritten Words

The handwritten word generation model produces text in two categories. The first category focuses on generating meaningless character strings that do not correspond to real Arabic words. The second type creates meaningful words by randomly selecting real Arabic terms from a dictionary.

Generation of Arabic handwritten meaningless words:
Alsuhibany and Alquraishi’s [33] model involves programmatically generating handwritten Arabic words and assembling them into CAPTCHA images. This is done through a series of Hypertext Preprocessor (PHP) scripts. PHP is a general-purpose scripting language that can be utilized to create intuitive and dynamic scripts, automating the entire process.
PHP first selects a random word by combining characters from predefined Arabic letter arrays. The word is converted to Unicode representation. Individual letter images are retrieved from the Arabic letters MySQL database. This database contains over 250 photos of letters in various positions/styles.
The letters are arranged on a blank canvas using coordinates calculated from baseline positioning functions. Random wrinkles are applied and overlapped to make the handwriting appear natural. Joining points between letters are highlighted with colored ellipses. Baseline differences are accumulated to adjust letter placements vertically.
An example of generated CAPTCHA images of meaningless words is shown in Figure 2.
Generation of Arabic handwritten meaningful words:
To enhance realism and usability, the word generation process is enhanced. Instead of combining random letters, meaningful Arabic words are selectively constructed. An external dictionary file containing over 10,000 common words is incorporated into this process. The script is modified to retrieve validated words randomly from this dictionary, instead of the predefined character arrays. Additional validation is added to filter out words containing rare characters unsupported by the letter image database. Punctuation and diacritics are also stripped for simplicity. This improves the linguistic validity and readability of the generated CAPTCHAs for human users. As real words, they are more intuitive to decipher compared to random strings. An example of generated CAPTCHA images of meaningful words is shown in Figure 3.

3.2. Adversarial Techniques Used

Adversarial CAPTCHAs were generated using five approaches:

Expectation Over Transformation (EOT).
Scaled Gaussian Translation with Channel Shifts (SGTCS).
Jacobian-based Saliency Map Attack (JSMA).
Immutable Adversarial Noise (IAN).
Connectionist Temporal Classification (CTC).

Table 2 shows some examples of Arabic handwritten meaningless word CAPTCHA images generated using the five approaches, while Table 3 shows some examples of Arabic handwritten meaningful word CAPTCHA images generated using the five approaches.

1.

Expectation Over Transformation (EOT):

EOT aims to generate small input perturbations that lead a neural network to produce incorrect predictions. With EOT, adversarial perturbations are formulated as the expectation of infinitesimal transformations of the original input, as opposed to other attack methods that directly optimize the network loss or logits. The following are the critical EOT steps:

(a): Identify a distribution $p (τ)$ of potential small input transformations. Gaussian noise or patch rotations and translations of the input image are common choices.
(b): To obtain a Jacobian J that demonstrates how network activations change with small input variations, linearize the neural network function around the original input x.
(c): Determine the expectation of the network predictions $E_τ [f (x + τ)]$ over the transformation distribution. This can be approximately expressed as $J \cdot E_τ [τ] + f (x)$ .
(d): Change the original input in a way that increases the expectation for an alternative target class $y^{'}$ while decreasing the expectation for the true class y. This causes an adversarial perturbation $(δ)$ .
(e): To produce the adversarial example $x^{'} + δ$ , add the perturbation to the original input. The network should confidently and incorrectly predict $y^{'}$ as a result of this process.

We show the design of EOT in Algorithm 1.

Rationale for Selection. EOT was chosen for three reasons:
(a)
Robustness to Preprocessing: CAPTCHAs often apply distortions to thwart attacks; EOT’s perturbations account for these transformations.
(b)
Real-World Relevance: unlike attacks assuming static inputs, EOT mimics adversarial examples surviving sensor noise or rendering artifacts.
(c)
Transferability: by optimizing over a distribution of $τ$ , EOT-generated attacks generalize better to black-box models.
Contribution to CAPTCHA Security EOT exposes vulnerabilities in CAPTCHAs that rely on deterministic preprocessing (e.g., fixed noise patterns). By showing that adversarial examples can persist through randomized transformations, our work argues for non-differentiable CAPTCHA augmentation (e.g., non-grid warping) to break gradient-based attacks.
Experimental Parameters For reproducibility, we configured EOT with the following:
−
Transformation Distribution: Gaussian noise ( $σ$ = 0.1), rotations (±15°), and translations (±5%).
−
Perturbation Budget: $L \infty$ -norm ≤ 0.05 (normalized pixel range).
−
Optimization: 200 iterations with ( $α$ = 0.01).
−
Expectation Approximation: 50 samples per iteration.

Algorithm 1 EOT Adversarial Generation.

1:: Input:
2:: Original image x
3:: Transformation distribution $p (τ)$
4:: Target class $y^{'}$
5:: Number of samples n
6:: Identify $p (τ)$ :
7:: Specify distribution over small transformations
8:: Linearize F around x:
9:: Compute Jacobian J expressing change in F w.r.t inputs
10:: Determine expectation: $E_{τ} [f (x + τ)] \approx J * E_{τ} [τ] + f (x)$
11:: for $i = 1$ to n do
12:: Sample noise $τ_{i} \sim p (τ)$
13:: Perturb image: $x_{i} = x + τ_{i}$
14:: Optimize expectation for $y^{'}$ over y
15:: end for
16:: Return adversarial image:
17:: $x^{'} = x + perturbation$
18:: Output:
19:: Adversarial image $x^{'}$

2.

Scaled Gaussian Translation with Channel Shifts (SGTCS)

SGTCS is a targeted adversarial perturbation technique that aims to incorrectly classify an input image as belonging to a particular target class. To create an imperceptible disturbance, it applies scaled Gaussian noise to each channel of the image.

The attack takes in an original correctly classified image, a target class label, and a pretrained convolutional neural network (CNN) model. It first generates randomly sampled Gaussian noise with a zero mean and unit variance for each channel of the image.

Next, a small hyperparameter (

ϵ

= 0.1) that controls the perturbation’s magnitude scales the noise values. Next, each channel of the original image has the scaled noise added to it element-by-element. As a result, each color channel in the disturbed image has indistinguishable noise added to it.

The predictions are then observed after the perturbed image has been run through the pretrained CNN model. By creating an adversarial example, the attack is successful if the model incorrectly labels the image as belonging to the target class. Otherwise, the noise scaling factor is increased, and this process is repeated until misclassification occurs or a maximum number of iterations is reached. This allows for stronger perturbations until the model’s decision boundary is crossed.

The SGTCS method targets all three dimensions of translation invariance: spatial, color, and instance translations. By adding scaled Gaussian noise individually to each color channel, it breaks the color translation invariance of CNNs. This forces the model to rely more on spatial and instance-based cues for classification.

SGTCS generates a targeted adversarial perturbation by adding scaled Gaussian noise to each color channel of the input image. Mathematically, let x be the original input image. SGTCS computes the perturbation

δ = ϵ * N

, where N is a tensor of Gaussian noise with zero mean and unit variance, and

ϵ

is a small hyperparameter controlling the perturbation magnitude. The perturbed image is then created as

x^{'} = x + δ

, where the noise is added element-wise to each color channel of x. The goal is to find the minimum

ϵ

that causes the model to misclassify the perturbed image

x^{'}

as the target class.

Attacking spatial invariance is additionally helped by the randomness of noise. The aim is to create a minimal perturbation that induces misclassification in a selected target class while being difficult for the model to detect. This is what the SGTCS method attempts to accomplish by simultaneously perturbing all translation dimensions. We show the design of SGTCS in Algorithm 2.

Rationale for Selection. SGTCS was chosen for three reasons:
(a)
CAPTCHA-Specific Weaknesses: many CAPTCHAs rely on color/texture invariance for automated solving; SGTCS tests whether minor channel shifts can break this assumption.
(b)
Controlled Perturbations: unlike untargeted noise attacks, SGTCS iteratively adjusts $ϵ$ to find the minimal perceptible perturbation, mimicking real-world adversarial constraints.
(c)
Diagnostic Value: by isolating failures in color/spatial invariance, SGTCS reveals which CAPTCHA features (e.g., hue consistency, edge alignment) are over-relied on by models.
Contribution to CAPTCHA Security SGTCS demonstrates that CAPTCHAs using color-based obfuscation (e.g., overlapping hues, gradient backgrounds) are vulnerable to channel-specific perturbations. Our results advocate for non-invariant CAPTCHA features, such as the following:
−
Non-grid-aligned characters (breaking spatial invariance).
−
High-contrast, non-RGB color spaces (e.g., CMYK patterns, disrupting channel-wise attacks).
Experimental Parameters
−
Initial Perturbation: $ϵ$ = 0.1, scaled by +0.05 per iteration.
−
Noise Distribution: N(0,1) sampled independently per channel.
−
Termination Conditions:
∗
Maximum iterations: 50
∗
Success threshold: $ϵ \leq 0.3$ (normalized pixel range).
−
Model Constraints: inputs clipped to [0,1] post-perturbation to maintain valid pixel ranges.

Algorithm 2 SGTCS Adversarial Generation.

1:: Input:
2:: Original image x
3:: Epsilons $ϵ$
4:: Step size $α$
5:: Number of iterations
6:: Initialization:
7:: Perturbation $δ \leftarrow 0$
8:: for $i = 1$ to number of iterations do
9:: Iteration i:
10:: Compute gradient: $\nabla L \leftarrow \frac{\partial L}{\partial x}$
11:: Update perturbation: $δ \leftarrow δ + α \cdot sgn (\nabla L)$
12:: Clip perturbation: $δ \leftarrow clip (δ, - ϵ, ϵ)$
13:: Generate adversarial image: $x^{'} \leftarrow x + δ$
14:: Clip image values: $x^{'} \leftarrow clip (x^{'}, 0, 1)$
15:: Predict on adversarial image: $p \leftarrow arg max S (F (x^{'}))$
16:: end for
17:: Output: Adversarial image $x^{'}$

3.

Jacobian-based Saliency Map Attack (JSMA): An iterative, greedy algorithm called JSMA is used to create adversarial examples. The way that JSMA operates is by calculating an image’s Jacobian matrix with respect to the logits of the model. The matrix shows how sensitive each pixel’s changes are to the model’s predictions. A saliency map is created as a matrix to determine which pixels have the greatest influence over the model’s predictions. Firstly, a target class t is chosen by the algorithm to incorrectly classify the image. The original clean image is then initialized to represent the adversarial image. The algorithm calculates the Jacobian of the current adversarial image with respect to the logits for each iteration. From the Jacobian, it computes a saliency map, which assigns each pixel a value indicating how much modifying that pixel would affect the predicted probability of class t versus the original predicted class. In an attempt to raise the probability of class t, it then greedily changes the two pixels with the highest absolute saliency values, iteratively calculating the Jacobian/saliency map. The modification of pixels continues until the model classifies the image as class t or until a predetermined modification budget is exhausted. With relatively few pixel modifications, JSMA generates targeted adversarial perturbations by iteratively identifying the most influential pixels, based on the current predictions of the model. Mathematically, let x be the original input image and y be the true class label. The goal of JSMA is to find a small perturbation

δ

that can be added to x to create an adversarial example

x^{'} = x + δ

, such that the model incorrectly classifies

x^{'}

as a target class

y^{'}

. The JSMA algorithm calculates the Jacobian matrix J of the model’s output with respect to the input x, where

J [i, j] = \partial F (x) [i] / \partial x [j]

, and

F (x)

represents the model’s output logits. From the Jacobian matrix, a saliency map S is computed, where

S [i, j] = m a x (0, \partial F (x) [t] / \partial x [i] * - \partial F (x) [y] / \partial x [j])

and t is the target class. This saliency map identifies the pixels that have the greatest influence on increasing the probability of the target class t while decreasing the probability of the true class y.

The JSMA algorithm then iteratively modifies the two pixels with the highest saliency values, updating the Jacobian and saliency map at each step, until the model classifies the image as the target class or a predetermined perturbation budget is exhausted.We show the design of JSMA in Algorithm 3.

Rationale for Selection. JSMA was chosen for three reasons:
(a)
Targeted Attack Capability: unlike untargeted attacks, JSMA forces misclassification to a specific class, mimicking real-world CAPTCHA bypass scenarios.
(b)
Minimal Perturbations: by modifying only critical pixels, it tests whether CAPTCHAs are vulnerable to imperceptible alterations.
(c)
White-Box Relevance: as CAPTCHA defenses often rely on obfuscation, JSMA’s reliance on model gradients highlights weaknesses in systems assuming attackers lack model knowledge.
Contribution to CAPTCHA Security:
JSMA’s saliency maps reveal which CAPTCHA features (e.g., character spacing, noise patterns) are most vulnerable to adversarial manipulation. By quantifying how few pixel changes are needed to deceive models, our work underscores the need for non-differentiable CAPTCHA designs (e.g., randomized distortions) that resist gradient-based attacks.
Experimental Parameters:
−
Perturbation Budget: maximum $L_{0}$ -norm of 15% of total pixels.
−
Iterations: 100 steps or until misclassification.
−
Target Class Selection: least-likely class (untargeted) or predefined labels (targeted).
−
Pixel Constraints: modifications limited to ±10 intensity values per step to mimic subtle adversarial alterations.

Algorithm 3 JSMA Adversarial Generation.

1:: Input:
2:: x: Original image
3:: t: Target class
4:: $ϵ$ : Epsilon budget
5:: $ϕ$ : Mask
6:: while $F (x^{'}) \neq t$ do
7:: $x^{'} f \leftarrow F (x^{'})$ {Fourier transform}
8:: $g \leftarrow \nabla L (x^{'} f, t)$ {Compute gradient in Fourier domain}
9:: Calculate Jacobian J of $x_{f}^{'}$ w.r.t. model logits
10:: $S \leftarrow SaliencyMap (g)$ {Calculate saliency map}
11:: $S \leftarrow S \circ ϕ$ {Apply mask}
12:: $p \leftarrow arg max S$ {Greedily select pixels p in S with highest values}
13:: $x^{'} f \leftarrow x^{'} f + ϵ \cdot Perturb (p)$ {Perturb selected pixels and neighbors}
14:: $x^{'} \leftarrow F^{- 1} (x^{'} f)$ {Inverse Fourier transform}
15:: Update $x^{'}$
16:: end while
17:: Return $x^{'}$
18:: Output: Adversarial image $x^{'}$

4.

Immutable Adversarial Noise (IAN):

IAN is a training technique that aims to add imperceptible perturbations (known as adversarial noise) to inputs in order to fool deep learning models while maintaining the original prediction. This helps to make the models more robust against adversarial attacks. The fundamental goal of IAN is to identify small perturbations that can be added to inputs to alter model predictions while maintaining the inputs’ perceived meaning by humans. There are two main steps to achieving this:

(a): Perturbation Computation: IAN firstly feeds a clean input through the model to obtain the original prediction. It then computes an adversarial perturbation based on the gradient of the loss with respect to the input, which can change the prediction when added to it. This perturbation is computed so that it is imperceptible to humans.
(b): Adversarial Training: The computed perturbation is then combined with the clean input to generate an adversarial example. This perturbed input is fed into the model, which produces a different prediction to the clean input. The perturbation is interpreted as the “correct” or “target” output. This set up is an adversarial training example that can be used to fine-tune the model weights.

IAN aims to find a small perturbation

δ

that can be added to the input x to alter the model’s prediction while maintaining the original meaning of the input as perceived by humans. Mathematically, IAN first computes the gradient

\nabla_{x} L (x, y)

of the loss L with respect to the input x, where y is the true label. The perturbation

δ

is then defined as (Section 2.4), where

ϵ

is a hyperparameter controlling the perturbation magnitude. The perturbed input

x^{'} = x + δ

is then used as an adversarial training example to fine-tune the model, with the goal of making the model robust to the computed perturbation.

IAN aims to train models that are robust to imperceptible perturbations by repeating the above process on multiple inputs from the training dataset. Using perceptual loss functions, the perturbations are limited to remain small and imperceptible. This enables IAN to produce adversarial examples that do not change significantly how humans interpret inputs. We show the design of IAN in Algorithm 4.

Rationale for Selection. IAN was chosen for two reasons:
(a)
Human-Readable Perturbations: unlike standard adversarial training, IAN’s $ϵ$ -bounded noise ensures CAPTCHA legibility post-perturbation.
(b)
Defense-Aware Evaluation: by training models with IAN, we test whether CAPTCHA recognition systems can resist perturbations without degrading human usability.
Contribution to CAPTCHA Security:
IAN reveals that CAPTCHA models trained without adversarial robustness fail catastrophically when perturbed, even with imperceptible noise.
Experimental Parameters
−
Perturbation Budget: $ϵ = 0.03$ (normalized pixel range), incrementally increased to $ϵ = 0.1$ .

Algorithm 4 IAN Adversarial Generation.

1:: Input:
2:: Original image x
3:: Epsilons $ϵ$
4:: Compute perturbation:
5:: Sample noise N
6:: Perturbation $= N \cdot ϵ$
7:: Create adversarial example: $x^{'} = x + perturbation$
8:: Output: Adversarial image $x^{'}$

5.

Connectionist Temporal Classification (CTC):

The CTC model represents an adversarial perturbation. CTC is a type of neural network architecture widely used in sequence labeling tasks such as speech recognition and handwriting recognition, where the input and output sequences are of different lengths.

The main components of CTC are the following:

(a): Recurrent Neural Network (RNN): CTC employs an RNN as the primary component for encoding the input sequence. There are several possible forms for the RNN, including a simple RNN, LSTM network, or GRU network. The RNN reviews the input sequence in order, saving a hidden state to remember information from previous time points.
(b): Softmax Output Layer: A softmax layer that provides a probability for each possible label in each step is present in the RNN’s output layer. The softmax layer calculates the probability of each label using the input sequence and the RNN’s current hidden state.
(c): CTC Loss Function: The CTC loss function is utilized for computing the loss between the actual label sequence and the forecast probability distribution. The definition of the CTC loss function is as follows: $L = (- l o g (P (y | x))$
where L is the loss and $P (y | x)$ is the probability of finding the true label sequence y given the input sequence x. The CTC loss function is computed using dynamic programming, which allows the loss to be calculated efficiently even with long input sequences.

Keras CTC was used to train a benchmark sequence recognition model. A stack of LSTM layers, a dense layer, and a CTC loss layer made up the model architecture. Until the model’s performance was considered acceptable, it was trained on a sequence dataset.

To generate adversarial examples, the Fast Gradient Sign Method (FGSM) can be applied to the CTC loss, computing the perturbation

δ = ϵ * s i g n (\nabla_{x} L (x, y))

. FGSM is a simple and computationally inexpensive method for determining perturbations that increase a model’s loss function. Given a test example, the FGSM determines the smallest perturbation required to change the model predictions.

For each test example, the loss gradient was calculated with respect to the input and taking a step in the direction of the gradient’s sign. The step size is a hyperparameter that determines the magnitude of the perturbation. Larger step sizes are more likely to affect model behavior, but they may introduce detectable noise.

By feeding perturbed examples into the CTC model and monitoring whether the predictions change, the adversarial accuracy of the model was assessed. Additionally, the perturbations’ qualitative impact on the inputs’ discernible noise or distortions was calculated. We show the design of CTC in Algorithm 5.

Rationale for Selection. CTC was chosen for two reasons:
(a)
Temporal CAPTCHA Relevance: many CAPTCHAs use animated/text-scrolling designs; CTC’s sequence alignment mirrors automated solvers’ temporal reasoning.
(b)
Alignment Vulnerabilities: CTC’s reliance on input-output alignment makes it susceptible to perturbations that shift temporal features (e.g., character onset/offset).
Contribution to CAPTCHA Security Our CTC-based attacks reveal that temporal CAPTCHAs are vulnerable to gradient-aligned perturbations:
−
Key Insight: automated solvers over-rely on consistent character timing and spacing, which adversarial noise can disrupt.
−
Per-Frame Noise: add dynamic, non-differentiable noise to individual frames to block gradient-based attacks.
Experimental Parameters
−
Model Architecture:
∗
3 bidirectional LSTM layers (256 units each),
∗
Optimizer (lr=0.001)
−
Adversarial Settings:
∗
$ϵ = 0.05$ (normalized pixel range),
∗
Sequence perturbation budget: $L \infty \leq 0.1$ per frame

Algorithm 5 CTC Adversarial Generation.

1:: Input:
2:: Input sequences X
3:: Step sizes $α$
4:: True label sequence y Label sequences $π$
5:: for all $x \in X$ do
6:: $π \leftarrow$ Predict probability distribution using CTC model
7:: $L \leftarrow - log (P (y | π))$ {Compute CTC loss}
8:: $\nabla L \leftarrow \frac{δ L}{δ x}$ {Compute loss gradient w.r.t x}
9:: $r \leftarrow α \cdot (\nabla L)$ {Generate perturbation}
10:: $x^{'} \leftarrow x + r$ {Create adversarial sequence}
11:: end for
12:: Output: Adversarial image $x^{'}$

3.3. Evaluation

This section describes the methodology used to evaluate the security and usability of the proposed generating adversarial Arabic handwritten CAPTCHA approach.

3.3.1. Security Evaluation

The goal of the security evaluation was to assess the effectiveness of the perturbation techniques at preventing automated recognition of the generated CAPTCHAs. The recognition results were analyzed to determine the recognized rates achieved by each adversarial perturbation technique. Higher recognition rates indicated better security against bots.

The following outlines the experimental set-up and procedure used to test the CAPTCHA system’s security:

Experimental Set-Up:
This section describes the experimental set-up for the security evaluation.
−
System:
To evaluate the security of the adversarial CAPTCHAs, the Google Vision API was used [50] to recognize perturbed images. Google Vision is a model that uses machine learning to identify images that can be considered representative of modern bot capabilities. The Cloud Vision API provided an important benchmark for assessment, given its prevalence and high accuracy.
API Version: the testing was done using the latest stable version of the Google Cloud Vision API, which is currently version v1.
Input Data Format: the Vision API can process a variety of input data formats, including image files (JPEG, PNG, GIF, BMP) and base64-encoded image data.
Computer Vision Tasks: the testing involves evaluating the API’s performance on common computer vision tasks, such as the following:
∗
Image labeling: identifying and classifying objects, scenes, and activities in images.
∗
Facial detection and recognition: detecting and identifying faces in images.
∗
Optical character recognition (OCR): extracting text from images.
∗
Logo detection: identifying logos within images.
∗
Web detection: annotating images with information about the web entities they contain.
API Parameters: the testing involves experimenting with different API parameters, such as the following:
∗
Image source (file, base64-encoded).
∗
Vision feature types to enable
(e.g., LABEL_DETECTION, FACE_DETECTION).
∗
Language model to use for text-related features.
∗
Confidence thresholds for detected entities.
Validation (Text Detection (OCR)):
∗
Arabic Text: validate accuracy for handwritten/printed Arabic text.
∗
Mixed Languages: test images with Arabic + Latin script. Validation:
∗
Ground Truth: compare API output with known text.
∗
Accuracy Metric: Accuracy = (Correctly Recognized Characters/Total Characters) × 100
−
Dataset:
The experimental set up included creating a dataset of 60 Arabic handwritten word CAPTCHA images (30 meaningful words, 30 meaningless words). The images were generated using Alsuhibany and Alquraishi’s [33] model for generating meaningless words, with an improved model used to generate the meaningful words. The images were then categorized based on whether they were meaningful or meaningless. Arabic CAPTCHA images with perturbed handwritten words were then produced using five different perturbation techniques. These perturbed images were categorized into five groups based on the perturbation methods used. Finally, a total dataset of 300 perturbed Arabic handwritten word CAPTCHA images was compiled to be used in the security evaluation experiment.
∗
Size:
Base dataset: 60 Arabic handwritten CAPTCHAs (30 meaningless, 30 meaningful).
Perturbed dataset: 300 CAPTCHAs (5 techniques × 60 images).
∗
Source: synthetic generation using PHP scripts and a MySQL database of 250+ Arabic letter images in varied styles.
∗
Characteristics:
Meaningless words: random strings of 3–9 Arabic characters.
Meaningful words: selected from a dictionary of 10,000 common Arabic words, stripped of diacritics.
∗
Training/Testing Split: no traditional split; all CAPTCHAs were generated and evaluated.
Experimental Procedure:
This section describes the experimental procedure for the security evaluation.
−
The 300 perturbed CAPTCHA images with different adversarial perturbations were tested using Google Vision [50].
−
The recognition results (text labels) were analyzed and classified into categories, as shown in Table 4.
−
The recognition rates in each category were calculated for each perturbation technique.

3.3.2. Usability Evaluation

An experiment was conducted to evaluate the usability of the Arabic handwriting CAPTCHA system.

Experimental Set-Up:
The experimental set-up of the experiment was as follows:
−
Participants:
To enhance participant engagement, reduce respondent burden, and increase the likelihood of completing the entire experiment, the experiment was divided into two equal-length sections, accessible via separate links. Each link contained an identical number of images and a balanced distribution of meaningful and meaningless words. Link 1 had 132 participants out of a total of 294, and Link 2 had 162 participants.
−
Design:
An online test containing 33 pages with two links was developed. There were 30 CAPTCHA questions in each link. 30 distinct CAPTCHA images were chosen at random from the test dataset. There were 15 meaningful words in the CAPTCHA images and 15 meaningless words. There was a text field for participants to enter their answer next to each CAPTCHA image.
−
System:
The usability testing system consisted of an online website, created using Figma and Weavely forms. It was responsively designed for both desktop and mobile use.
Experimental Procedure:
−
The Procedure of the usability experiment:
The initial page gathered demographic information such as age, gender, proficiency in Arabic, and experience in CAPTCHA.
The task and method of interaction were explained on the second page of a set of instructions.
Participants were presented with an example CAPTCHA image and detailed step-by-step instructions on how to complete it.
Each page displayed one CAPTCHA sample for participants to solve.
A progress bar tracked completion across all pages.
Participants could not advance pages without solving the CAPTCHA correctly.
The system did not provide feedback on answer correctness or hints.
Upon completion, a final page contained a thank-you page.
−
Usability Evaluation Metrics:
Usability was evaluated based on the following:
Time spent solving CAPTCHAs:
The average time taken to complete a CAPTCHA (in seconds), from the ‘Start’ button click to the ‘Submit’ button click. Quicker average solution times indicated a better user experience.
Accuracy in solving CAPTCHAs:
Accuracy referred to the percentage of CAPTCHAs that participants were able to solve accurately. A higher accuracy rate indicated improved usability.

4. Results

This section presents the results from evaluating the usability and security of the proposed adversarial Arabic handwritten CAPTCHA system. Both meaningless and meaningful word data are analyzed from the user study and OCR. The performance of the five adversarial perturbation techniques—JSMA, SGTCS, EOT, CTC, and IAN—is summarized in Table 5.

4.1. Security Evaluation

A range of adversarial perturbation techniques using meaningless and meaningful words was used and the results were analyzed using Google Vision API [50]. Figure 4, Figure 5, Figure 6 and Figure 7 show CAPTCHAs recognized in each category (completely, partially, incorrectly, not recognized, respectively). The five techniques tested were CTC, EOT, IAN, JSMA, and SGTCS. A total of 30 images each of meaningless and meaningful words were tested using each technique.

The recognition rates for each category are shown below. The rates for Arabic handwritten meaningless words are presented in Table 6, while the rates for meaningful words are presented in Table 7.

4.2. Usability Evaluation

To evaluate the usability of proposed adversarial Arabic handwritten CAPTCHAs, data collected from the online experiment were analyzed. The samples included words that had been perturbed using each of the five adversarial techniques (CTC, EOT, IAN, JSMA, SGTCS). The CSV file contained records of 294 responses.

4.2.1. Participants

The participants’ characteristics were as follows:

Gender: female or male.
Age: Under 18 years, 18–25 years, 26–35 years, 36–45 years, 45+ years.
Arabic proficiency: Most participants were Arabic speakers, except 20 participants (approximately 7% of 294) who reported not speaking Arabic.
CAPTCHA experience: yes or no.

The pie charts in Figure 8 show the participants’ characteristics.

4.2.2. Time Spent Solving CAPTCHAs

The average time calculated spent solving CAPTCHAs is shown in Table 8, which shows the average time participants took to solve a single Arabic CAPTCHA.

4.2.3. Accuracy in Solving CAPTCHAs

The five perturbation techniques tested were CTC, EOT, IAN, JSMA, and SGTCS. The accuracy in solving CAPTCHAs for each technique was calculated based on the total number of words associated with that technique. Accuracy was calculated separately for the following:

Arabic handwritten meaningless words: the accuracy for each technique is shown in Table 9.
Arabic handwritten meaningful words: the accuracy for each technique is shown in Table 10.

The Accuracy rate for each technique was calculated using the following formula:

A c c u r a c y R a t e = (C o r r e c t R e s p o n s e s / T o t a l P o s s i b l e R e s p o n s e s) \times 100

where:

T o t a l P o s s i b l e R e s p o n s e s = N u m b e r o f w o r d s \times N u m b e r o f r e s p o n s e s

5. Discussion

This section provides an analysis and interpretation of the key findings from the security and usability experiments conducted in the results section. We investigated the effectiveness of various adversarial perturbation techniques to perturb Arabic handwritten CAPTCHAs. The results are examined and compared to findings from previous studies.

5.1. Security Level

Google Vision [50] recognition rates for Arabic handwritten words under various adversarial perturbations are discussed. First, the effects of the perturbations on meaningless words are explored. A discussion of the meaningful word results is then presented. Key findings regarding the complete recognition, partial recognition, and incorrect recognition rates are highlighted.

5.1.1. Meaningless Words

The results of applying various adversarial perturbations to meaningless words and testing recognition were examined. The key findings were as follows:

CTC achieved a comparable 43.33% level of partial recognition, with a lower 23.33% rate of unrecognized results, indicating that it perturbed the text more efficiently than EOT in confounding machine vision systems.
For EOT, none of the images were completely recognized correctly, indicating this technique successfully introduced some distortion. However, the high 66.66% figure of partial recognition suggests that the distortion had only a minimal impact, allowing the text to remain partially legible to machines.
IAN resulted in a 26.66% incorrect recognition rate and a higher partial recognition rate of 60%, implying that its distortions did not completely obscure the text, unlike more successful techniques. IAN needs to improve its obscuring capabilities.
JSMA uniquely had only one image 3.33% completely recognized. The highest percentage of 30% of images unrecognized aligns with the following characteristics:
−
JSMA cannot identify the same level of influential pixels/features, as the meaningless words do not have inherent semantic meaning that the model has learned to recognize robustly.
−
With less identifiable influential features, JSMA can introduce smaller perturbations to cause a misclassification.
−
This results in a lower non-recognition rate (30%) for meaningless words, as the adversary can more easily perturb the input to bypass the CAPTCHA.
Finally, SGTCS achieved a high partial recognition of 63.33% but a similar incorrect or unrecognized split to IAN, implying a comparable obscuring ability.

JSMA showed the strongest protection against automated solvers. The varying outcomes between techniques also support the concept that each distortion type impacts recognition differently.

5.1.2. Meaningful Words

The results of applying various adversarial perturbations to meaningful words and testing recognition were analyzed. The key findings were as follows:

CTC achieved complete recognition for 10% of the images. This technique was not as successful at obscuring meaningful words, as it had a higher rate when compared to using meaningless words. Machine vision systems may have exploited semantic understanding.
EOT reached the highest complete recognition at 23.33%. Its distortions were the least effective at obscuring meaningful words, since it achieved the weakest obfuscation, as measured by recognition success. It also had 13.33% incomplete words and 33.33% unrecognized words, indicating mixed effectiveness.
IAN achieved a partial recognition rate of 33.33%, with 13.33% completely recognized, indicating that it obscured some of the meaningful words.
JSMA obtained the lowest complete and partial recognition rates of 6.66%.
−
JSMA creates a saliency map that identifies the pixels/features that have the greatest influence on the model’s prediction of the meaningful word.
−
To cause a misclassification, JSMA needs to introduce larger perturbations to those influential pixels/features, as the model has higher confidence in recognizing the meaningful word.
−
This results in a higher non-recognition rate (60%) for meaningful words, as the adversary needs to make more substantial changes to bypass the CAPTCHA.
This provided evidence that JSMA’s distortions most successfully disrupted semantic analysis, maximally confounding machine solvers as intended. This suggests that JSMA most strongly perturbed the meaningful words.
Finally, SGTCS resulted in an intermediate complete recognition of 6.66% and a partial rate of 13.33%, with 63.33% unrecognized.

In general, the techniques were less effective at obscuring meaningful compared to meaningless words. JSMA appeared most promising, as it most strongly distorted meaningful text images. This has implications for generating more robust adversarial CAPTCHAs that use both types of text.

5.2. Usability Level

This study evaluated the usability of various Arabic handwritten CAPTCHAs by analyzing participant accuracy rates in the recognition of perturbed handwritten words. There were five techniques used to perturb words: CTC, EOT, IAN, JSMA, and SGTCS.

5.2.1. Meaningless Words

The results of applying various adversarial perturbations to meaningless words were analyzed. The key findings based on the accuracy rates are shown in Table 11.

5.2.2. Meaningful Words

The results of applying various adversarial perturbations to meaningful words were analyzed. The key findings based on the accuracy rates are shown in Table 12.

5.3. Comparison to Previous Studies

The usability evaluation helps to understand the user experience when solving different CAPTCHAs. The security assessment examines the effectiveness of various perturbation techniques against modern bots.

Table 13 summarizes results from five previous studies evaluating Arabic CAPTCHAs techniques. These can be compared with the results of our study.

There have been several new techniques put in place to generate Arabic handwritten CAPTCHAs:

Synthetic Arabic handwritten CAPTCHA methods that use machine learning to generate authentic looking handwritten words [37].
Distorting Arabic word images through OCR procedures to improve the security of Arabic CAPTCHAs [35].
A process of handwritten text segmentation that includes creating synthetic cursive Arabic word images and asking users to verify CAPTCHAs by recognizing segmentation points [34].
Visual cryptography that splits images into two components that need to be properly aligned for decryption [33].

All of these methods were important first attempts in the development and assessment of Arabic handwritten CAPTCHAs.

In terms of security, each study introduced innovative approaches to generating and evaluating Arabic handwritten CAPTCHAs. All methods demonstrated acceptable security results [33,34,35,37].

Analyzing usability, the research found high accuracy and success rates, indicating that the methods were easy for people to understand [33,34,35,37]. The focus on usability is crucial for acceptance.

Lines and flips [37] and adversarial distortions both try to fool text recognition. Lines and flips use visual changes like flipping text or adding wavy lines, while adversarial distortions create a lack of clarity in the image by leveraging gradient information from the target ML model. Adversarial examples are algorithmically designed to maximize neural network confusion. Tests showed adversarial distortion techniques like JSMA worked better at disrupting machines’ ability to read text, while lines and flips are more visible and predictable, which makes it easier for machines to recognize the text despite visual changes. The results indicated that the machines’ ability to recognize text reached 96.42%. However, this form of OCR is less effective than Google Vision, which leverages predictability, machine learning algorithms trained on structured text, and an ability to counteract adversarial distortions by maintaining high recognition accuracy through leveraging its training on predictable text structures. However, with techniques like JSMA, our adversarial perturbations studies still achieved a recognition rate of 86.66%, demonstrating the ongoing potential of these approaches.

In terms of usability, both methods are highly accurate, with lines and flips in the study [37] having a usability of 90–95% and JSMA in our study having a usability of 90.6%.

Our study proposes a new dimension by introducing adversarial perturbations. This approach provides deeper insights into the security of Arabic handwritten CAPTCHAs. Our research utilizes an adversarial perturbation method to generate adversarial Arabic written CAPTCHAs and to evaluate the security and usability of various adversarial perturbation techniques. Five methods, CTC, EOT, IAN, JSMA, and SGTCS, were tested.

6. Conclusions and Future Work

The aim of the study was to generate adversarial Arabic handwritten CAPTCHAs and evaluate the security and usability of different adversarial perturbation techniques for human readable CAPTCHAs. Five techniques, CTC, EOT, IAN, JSMA, and SGTCS, were implemented to distort Arabic handwritten words. Both automated recognition testing and human participant evaluations were conducted. Results showed JSMA achieved the optimal equilibrium of high protection and acceptable usability. Meaningful words enhanced performance in human recognition at 90.6%, while unpredictability in meaningless words aided security versus bots. The results provide direction for generation techniques and setting baselines, and focus on important tasks like perturbation calibration. Overall, the study advanced the development of Arabic handwritten CAPTCHAs that will be more secure and readable in the future.

Potential directions for future work include combining perturbation techniques like JSMA and SGTCS into hybrid approaches to evaluate how they may enhance security against both third-party humans and machines. Additionally, optimizing the clarity of the handwritten fonts used could further improve the balance of usability and security.

Author Contributions

Conceptualization, S.A.A.; Methodology, G.A.; Software, G.A.; Validation, S.A.A.; Formal analysis, S.A.A.; Investigation, G.A.; Resources, G.A.; Data curation, G.A.; Writing—original draft, G.A.; Writing—review & editing, G.A. and S.A.A.; Visualization, G.A.; Supervision, S.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The Researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2025).

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the study involved anonymous data collection with no risk to participants.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.14048018, [51].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shivani, A.; Challa, R. CAPTCHA: A Systematic Review. In Proceedings of the 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMRI), Buldhana, India, 30 December 2020; pp. 1–8. [Google Scholar]
Bursztein, E.; Martin, M.; Mitchell, J. Text-based CAPTCHA Strengths and Weaknesses. In Proceedings of the 18th ACM Conference on Computer and Communications Security, Chicago, IL, USA, 17–21 October 2011; pp. 125–138. [Google Scholar]
Hasan, W.K.A. A Survey of Current Research on CAPTCHA. Int. J. Comput. Sci. Eng. Surv. (IJCSES) 2016, 7, 1–21. [Google Scholar] [CrossRef]
Sun, Y.; Xie, X.; Li, Z.; Yang, K. Batch-transformer for scene text image super-resolution. Vis. Comput. 2024, 40, 7399–7409. [Google Scholar] [CrossRef]
Elanwar, R.; Betke, M. Generative adversarial networks for handwriting image generation: A review. Vis. Comput. 2024, 41, 2299–2322. [Google Scholar] [CrossRef]
Aldosari, M.H.; Al-Daraiseh, A.A. Strong multilingual CAPTCHA based on handwritten characters. In Proceedings of the 2016 7th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 5–7 April 2016; IEEE: Piscatway, NJ, USA; pp. 239–245. [Google Scholar]
Khan, B.; Alghathbar, K.; Khan, M.K.; Alkelabi, A.; Alajaji, A. Cyber Security Using Arabic CAPTCHA Scheme. Int. Arab J. Inf. Technol. 2013, 10, 76–83. [Google Scholar]
Shi, C.; Xu, X.; Ji, S.; Bu, K.; Chen, J.; Beyah, R.; Wang, T. Adversarial CAPTCHAs. IEEE Trans. Cybern. 2019, 52, 6095–6108. [Google Scholar] [CrossRef]
Bursztein, E.; Aigrain, J.; Moscicki, A.; Mitchell, J. The end is nigh: Generic solving of text-based CAPTCHAs. In Proceedings of the Workshop on Offensive Technologies, San Diego, CA, USA, 19 August 2014. [Google Scholar]
Wang, P.; Gao, H.; Xiao, C.; Guo, X.; Gao, Y.; Zi, Y. Extended Research on the Security of Visual Reasoning CAPTCHA. IEEE Trans. Dependable Secur. Comput. 2023, 21, 4502–4516. [Google Scholar] [CrossRef]
Panda, S. Recognizing CAPTCHA using Neural Networks. Int. J. Sci. Res. Eng. Manag. 2022, 10. [Google Scholar] [CrossRef]
Wang, H.; Zheng, F.; Chen, Z.; Lu, Y.; Gao, J.; Wei, R. A CAPTCHA Design Based on Visual Reasoning. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6029–6033. [Google Scholar]
Hossen, M.I.; Hei, X.S. aaeCAPTCHA: The Design and Implementation of Audio Adversarial CAPTCHA. In Proceedings of the 2022 IEEE 7th European Symposium on Security and Privacy, Genoa, Italy, 6–10 June 2022; pp. 430–447. [Google Scholar]
Conti, M.; Guarisco, C.; Spolaor, R. CAPTCHaStar! A Novel CAPTCHA Based on Interactive Shape Discovery. In Proceedings of the International Conference on Applied Cryptography and Network Security, New York, NY, USA, 2–5 June 2015. [Google Scholar]
Truong, H.D.; Turner, C.F.; Zou, C.C. iCAPTCHA: The Next Generation of CAPTCHA Designed to Defend against 3rd Party Human Attacks. In Proceedings of the 2011 IEEE International Conference on Communications (ICC), Kyoto, Japan, 5–9 June 2011; pp. 1–6. [Google Scholar]
Umar, M. A Review on Evolution of various CAPTCHA in the field of Web Security. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 90–96. [Google Scholar] [CrossRef]
Usuzaki, S.; Aburada, K.; Yamaba, H.; Katayama, T.; Mukunoki, M.; Park, M.; Okazaki, N. Interactive Video CAPTCHA for Better Resistance to Automated Attack. In Proceedings of the 2018 Eleventh International Conference on Mobile Computing and Ubiquitous Network (ICMU), Auckland, New Zealand, 5–8 October 2018; pp. 1–2. [Google Scholar]
Alreshoodi, L.A.; Alsuhibany, S.A. A Proposed Methodology for Detecting Human Attacks on Text-based CAPTCHAs. Int. J. Eng. Res. Technol. 2020, 9, 193–202. [Google Scholar] [CrossRef]
Rusu, A.; Govindaraju, V. Handwritten CAPTCHA: Using the difference in the abilities of humans and machines in reading handwritten words. In Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR-9 2004), Tokyo, Japan, 26–29 October 2004. [Google Scholar] [CrossRef]
Rusu, A.; Thomas, A.; Govindaraju, V. Generation and use of handwritten CAPTCHAs. Int. J. Doc. Anal. Recognit. (IJDAR) 2010, 13, 49–64. [Google Scholar] [CrossRef]
Govindaraju, V.; Thomas, A. Enhancing Cyber Security through the Use of Synthetic Handwritten Captchas; Computer Science Department 226 Bell Hall: Buffalo, NY, USA, 2010; ISBN 978-1-124-24557-7. [Google Scholar]
Thomas, A.O.; Rusu, A.; Govindaraju, V. Synthetic handwritten CAPTCHAs. Pattern Recognit. 2009, 42, 3365–3373. [Google Scholar] [CrossRef]
Rusu, A.; Mislich, S.; Missik, L.; Schenker, B. A Multilingual Handwriting Approach to CAPTCHA. In Proceedings of the 2013 17th International Conference on Information Visualisation, London, UK, 16–18 July 2013; pp. 198–203. [Google Scholar]
Rusu, A.I.; Govindaraju, V. On the challenges that handwritten text images pose to computers and new practical applications. Document Recognition and Retrieval XII. Int. Soc. Opt. Photonics 2005, 5676, 84–91. [Google Scholar]
Rusu, A.; Govindaraju, V. Visual CAPTCHA with handwritten image analysis. In International Workshop on Human Interactive Proofs; Springer: Berlin/Heidelberg, Germany, 2005; pp. 42–52. [Google Scholar]
Rusu, A.; Govindaraju, V. A human interactive proof algorithm using handwriting recognition. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, Korea, 31 August–1 September 2005; IEEE: Piscatway, NJ, USA; pp. 967–971. [Google Scholar]
Achint, T.; Venu, G. Generation and performance evaluation of synthetic handwritten captchas. In Proceedings of the First International Conference on Frontiers in Handwriting Recognition, ICFHR, Montreal, QC, USA, 19–21 August 2008. [Google Scholar]
Rao, M.; Singh, N. Random Handwritten CAPTCHA: Web Security with a Difference. Int. J. Inf. Technol. Comput. Sci. (IJITCS) 2012, 4, 53. [Google Scholar] [CrossRef]
Rusu, A.I.; Docimo, R.; Rusu, A. Leveraging Cognitive Factors in Securing WWW with CAPTCHA. In Proceedings of the USENIX Conference on Web Application Development, Boston, MA, USA, 23–24 June 2010. [Google Scholar]
Rusu, A.; Docimo, R. Securing the web using human perception and visual object interpretation. In Proceedings of the 2009 13th International Conference Information Visualisation, Barcelona, Spain, 15–17 July 2009; IEEE: Piscatway, NJ, USA; pp. 613–618. [Google Scholar]
Alsuhibany, S.A.; Parvez, M.T. Attack-filtered interactive arabic CAPTCHAs. J. Inf. Secur. Appl. 2022, 70, 103318. [Google Scholar] [CrossRef]
Lajmi, H.; Idoudi, F.; Njah, H.; Kammoun, H.M.; Njah, I. Strengthening Applications’ Security with Handwritten Arabic Calligraphy Captcha. In Proceedings of the 2024 IEEE 8th Forum on Research and Technologies for Society and Industry Innovation (RTSI), Milano, Italy, 18–20 September 2024. [Google Scholar]
Alsuhibany, S.A.; Alquraishi, M. Usability and Security of Arabic Text-based CAPTCHA Using Visual Cryptography. Information 2022, 13, 112. [Google Scholar] [CrossRef]
Parvez, M.T.; Alsuhibany, S.A. Segmentation-validation based handwritten Arabic CAPTCHA generation. Comput. Secur. 2020, 93, 101829. [Google Scholar] [CrossRef]
Alsuhibany, S.A.; Parvez, M.T. Secure Arabic Handwritten CAPTCHA Generation Using OCR Operations. In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; IEEE: Piscatway, NJ, USA; pp. 126–131. [Google Scholar]
Abdalla, M.I.; Rashwan, M.A.; Elserafy, M.A. Generating realistic Arabic handwriting dataset. Int. J. Eng. Technol. 2019, 9, 3. [Google Scholar] [CrossRef]
Alsuhibany, S.A.; Almohaimeed, F.N. Synthetic Arabic handwritten CAPTCHA. Int. J. Inf. Comput. Secur. 2021, 16, 385–398. [Google Scholar] [CrossRef]
Alkhodidi, T.; Aljoudi, L.; Fallatah, A.; Bashy, A.; Ali, N.; Alqahtani, N.; Almajnooni, N.; Allhabi, A.; Albarakati, T.; Alafif, T.K.; et al. GEAC: Generating and Evaluating Handwritten Arabic Characters Using Generative Adversarial Networks. In Proceedings of the 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 17–18 March 2021; pp. 1–6. [Google Scholar]
Parvez, M.T.; Alsuhibani, A.M.; Alamri, A.H. Educational and Cybersecurity Applications of an Arabic CAPTCHA Gamification System. Ing. Syst. Inf. 2023, 28, 1275–1285. [Google Scholar] [CrossRef]
Khan, B.; Alghathbar, K.S.; Khan, M.K.; AlKelabi, A.M.; AlAjaji, A. Using Arabic CAPTCHA for Cyber Security. In Security Technology, Disaster Recovery and Business Continuity; Kim, T., Fang, W., Khan, M.K., Arnett, K.P., Kang, H., Ślęzak, D., Eds.; Communications in Computer and Information Science, Vol. 122; Springer: Berlin/Heidelberg, Germany, 2010; pp. 1–10. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 2, 2672–2680. [Google Scholar]
Terada, T.; Nguyen, V.N.K.; Nishigaki, M.; Ohki, T. Improving Robustness and Visibility of Adversarial CAPTCHA Using Low-Frequency Perturbation. In Proceedings of the International Conference on Advanced Information Networking and Applications, Sydney, Australia, 13–15 April 2022; pp. 586–597. [Google Scholar]
Osadchy, M.; Hernandez-Castro, J.; Gibson, S.J.; Dunkelman, O.; Pérez-Cabo, D. No Bot Expects the DeepCAPTCHA! Introducing Immutable Adversarial Examples, With Applications to CAPTCHA Generation. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2640–2653. [Google Scholar] [CrossRef]
Zhang, J.; Sang, J.; Xu, K.; Wu, S.; Zhao, X.; Sun, Y.; Hu, Y.; Yu, J. Robust CAPTCHAs Towards Malicious OCR. IEEE Trans. Multimed. 2021, 23, 2575–2587. [Google Scholar] [CrossRef]
Dinh, N.; Tran-Trung, K.; Hoang, V.T. Augment CAPTCHA Security Using Adversarial Examples With Neural Style Transfer. IEEE Access 2023, 11, 83553–83561. [Google Scholar] [CrossRef]
Wang, P.; Gao, H.; Guo, X.; Yuan, Z.; Nian, J. Improving the Security of Audio CAPTCHAs With Adversarial Examples. IEEE Trans. Dependable Secur. Comput. 2024, 21, 650–667. [Google Scholar] [CrossRef]
Kwon, H.; Kim, Y.; Yoon, H.; Choi, D. CAPTCHA Image Generation Systems Using Generative Adversarial Networks. IEICE Trans. Inf. Syst. 2018, 101, 417–424. [Google Scholar] [CrossRef]
Alsuhibany, S.A. A Survey on Adversarial Perturbations and Attacks on CAPTCHAs. Appl. Sci. 2023, 13, 4602. [Google Scholar] [CrossRef]
Google Cloud. Cloud Vision API. 2022. Available online: https://cloud.google.com/vision (accessed on 1 September 2024).
Ghaddy. Ghaddy/Adversarial: Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation (v1.0.0). Zenodo 2024. [Google Scholar] [CrossRef]

Figure 1. The proposed methodology for generating adversarial handwritten Arabic CAPTCHAs.

Figure 2. Generated sample of an Arabic handwritten meaningless word. The letters in this image are from the Arabic alphabet, which is written right-to-left. The word shown is not a real Arabic word, but rather a randomly generated sequence used for a CAPTCHA challenge.

Figure 3. Generated sample of an Arabic handwritten meaningful word. The letters in this image are from the Arabic alphabet, which is written right-to-left. The word shown is a real Arabic word, not a randomly generated sequence.

Figure 4. Complete recognition of meaningful word “مساعد”(assistant) (EOT).

Figure 5. Partial recognition of meaningless word “دسثجدح” (CTC).

Figure 6. Incorrect recognition of meaningless word “ذجسكيض” (IAN).

Figure 7. No recognition of meaningful word “اعظم” (greatest) (JSMA).

Figure 8. Pie charts of participants characteristics.

Table 1. Summary of CAPTCHA types.

Study	CAPTCHAs Type	Description
[14,15]	Interactive-based CAPTCHAs	Require users to perform interactive tasks like dragging and dropping shapes
[9,18]	Text-based CAPTCHAs	Present distorted text that users must correctly decipher
[13]	Audio CAPTCHAs	Contain audio clips of words/numbers for users to identify
[17]	Interactive video-based CAPTCHA	Involve interaction with videos, such as pausing at an object
[10,12]	Puzzle-Based CAPTCHAs	Comprise puzzles that must be solved correctly
[11]	Image-based CAPTCHAs	Feature images/objects for users to view and identify
[16]	Game-based CAPTCHAs	Implement games challenges

Table 2. Samples of Adversarial perturbation techniques applied to Arabic handwritten meaningless words.

Adversarial Perturbation Technique	Output
Expected Over Transformation (EOT)
Scaled Gaussian Translation With Channel Shifts (SGTCS)
Jacobian-based Saliency Map Attack (JSMA)
Immutable Adversarial Noise (IAN)
Connectionist Temporal Classification (CTC)

Table 3. Samples of adversarial perturbation techniques applied to Arabic handwritten meaningful words.

Adversarial Perturbation Technique	Output
Expected Over Transformation (EOT)
Scaled Gaussian TranslationWith Channel Shifts (SGTCS)
Jacobian-based Saliency Map Attack (JSMA)
Immutable Adversarial Noise (IAN)
Connectionist Temporal Classification (CTC)

Table 4. Recognition Categories.

Recognition Category	Description
Completely	All characters recognized correctly
Partially	Some characters recognized correctly
Incorrectly	All characters recognized incorrectly
Not	No characters recognized

Table 5. Security and Usability Performance of Adversarial Techniques.

Perturbation	Text Type	Security	Usability
	meaningless words	70%	75.7%
JSMA	meaningful words	86.66%	90.6%
	meaningless words	40%	58.8%
IAN	meaningful words	53.33%	80.6%
	meaningless words	33.33%	24%
EOT	meaningful words	46.66%	86%
	meaningless words	36.66%	65.7%
SGTCS	meaningful words	80%	82%
	meaningless words	56.66%	45.77%
CTC	meaningful words	50%	90.5%

Table 6. Results of Google Vision Recognition Rates for Meaningless Words.

Perturbation	Completely	Partially	Incorrectly	Not
CTC	0 = 0%	13 = 43.33%	10 = 33.33%	7 = 23.33%
EOT	0 = 0%	20 = 66.66%	6 = 20%	4 = 13.33%
IAN	0 = 0%	18 = 60%	8 = 26.66%	4 = 13.33%
JSMA	1 = 3.33%	8 = 26.66%	12 = 40%	9 = 30%
SGTCS	0 = 0%	19 = 63.33%	7 = 23.33%	4 = 13.33%

Table 7. Results of Google Vision Recognition Rates for Meaningful Words.

Perturbation	Completely	Partially	Incorrectly	Not
CTC	3 = 10%	12 = 40%	8 = 26.66%	7 = 23.33%
EOT	7 = 23.33%	9 = 30%	4 = 13.33%	10 = 33.33%
IAN	4 = 13.33%	10 = 33.33%	3 = 10%	13 = 43.33%
JSMA	2 = 6.66%	2 = 6.66%	8 = 26.66%	18 = 60%
SGTCS	2 = 6.66%	4 = 13.33%	5 = 16.66%	19 = 63.33%

Table 8. The average time spent solving one CAPTCHA.

	Average Time
Group 1	10.90 s
Group 2	10.58 s

Table 9. Accuracy Rates for Arabic handwritten meaningless words.

Technique	Number of Words	Correct Responses	Accuracy Rates
CTC	6	390	45.77%
EOT	6	206	24%
IAN	6	510	58.8%
JSMA	6	690	75.7%
SGTCS	6	600	65.7%

Table 10. Accuracy Rates for Arabic handwritten meaningful words.

Technique	Number of Words	Correct Responses	Accuracy Rates
CTC	6	765	90.5%
EOT	6	735	86%
IAN	6	722	80.6%
JSMA	6	864	90.6%
SGTCS	6	748	82%

Table 11. Analysis of meaningless word findings based on accuracy rates.

Technique	Accuracy Rates
SGTCS	65.7%
JSMA	75.7%
IAN	58.8%
CTC	45.77%
EOT	24%

Table 12. Analysis of meaningful word findings based on accuracy rates.

Technique	Accuracy Rates
JSMA	90.6%
IAN	80.6%
SGTCS	82%
CTC	90.5%
EOT	86%

Table 13. Summary of previous Arabic handwritten CAPTCHA studies.

Study	Technique	Text Type	Security Evaluation Type	Security	Usability
[35]	OCR	Meaningful words	OCR	51% difference in segments	More than 88% of accuracy
[34]	Segmentation-validation	Meaningful words	OCR	95.5%in segmentation attacks, 92.1% in recognition attacks	95%
[37]	Synthetic handwritten CAPTCHA	Meaningless words	Tesseract, ABBYY, and OCR	96.42%	94.03%
[33]	Visual Cryptography	Meaningful and meaningless words	CNN algorithm, GSA software 2.5, and Google Vision	92%	90%
		meaningless words		70%	75.7%
	JSMA	meaningful words		86.66%	90.6%
		meaningless words		40%	58.8%
	IAN	meaningful words		53.33%	80.6%
Current study		meaningless words	Google Vision [50]	33.33%	24%
	EOT	meaningful words		46.66%	86%
		meaningless words		36.66%	65.7%
	SGTCS	meaningful words		80%	82%
		meaningless words		56.66%	45.77%
	CTC	meaningful words		50%	90.5%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alrasheed, G.; Alsuhibany, S.A. Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation. Appl. Sci. 2025, 15, 2972. https://doi.org/10.3390/app15062972

AMA Style

Alrasheed G, Alsuhibany SA. Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation. Applied Sciences. 2025; 15(6):2972. https://doi.org/10.3390/app15062972

Chicago/Turabian Style

Alrasheed, Ghady, and Suliman A. Alsuhibany. 2025. "Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation" Applied Sciences 15, no. 6: 2972. https://doi.org/10.3390/app15062972

APA Style

Alrasheed, G., & Alsuhibany, S. A. (2025). Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation. Applied Sciences, 15(6), 2972. https://doi.org/10.3390/app15062972

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Security of Online Interfaces: Adversarial Handwritten Arabic CAPTCHA Generation

Abstract

1. Introduction

2. Literature Review

2.1. CAPTCHA Types

2.2. Latin Handwritten CAPTCHAs

2.3. Arabic Handwritten CAPTCHAs

2.4. Adversarial CAPTCHAs

3. Methodology

3.1. Arabic Handwritten CAPTCHA

Generation of Arabic Handwritten Words

3.2. Adversarial Techniques Used

3.3. Evaluation

3.3.1. Security Evaluation

3.3.2. Usability Evaluation

4. Results

4.1. Security Evaluation

4.2. Usability Evaluation

4.2.1. Participants

4.2.2. Time Spent Solving CAPTCHAs

4.2.3. Accuracy in Solving CAPTCHAs

5. Discussion

5.1. Security Level

5.1.1. Meaningless Words

5.1.2. Meaningful Words

5.2. Usability Level

5.2.1. Meaningless Words

5.2.2. Meaningful Words

5.3. Comparison to Previous Studies

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI