A Comparative Analysis of Arabic Text Steganography

Protecting sensitive information transmitted via public channels is a significant issue faced by governments, militaries, organizations, and individuals. Steganography protects the secret information by concealing it in a transferred object such as video, audio, image, text, network, or DNA. As text uses low bandwidth, it is commonly used by Internet users in their daily activities, resulting a vast amount of text messages sent daily as social media posts and documents. Accordingly, text is the ideal object to be used in steganography, since hiding a secret message in a text makes it difficult for the attacker to detect the hidden message among the massive text content on the Internet. Language’s characteristics are utilized in text steganography. Despite the richness of the Arabic language in linguistic characteristics, only a few studies have been conducted in Arabic text steganography. To draw further attention to Arabic text steganography prospects, this paper reviews the classifications of these methods from its inception. For analysis, this paper presents a comprehensive study based on the key evaluation criteria (i.e., capacity, invisibility, robustness, and security). It opens new areas for further research based on the trends in this field.


Introduction
The rapid expansion of Internet technologies enables flow of vast amounts of information across the public channel with risks of attacks. Under those circumstances, securing sensitive information has become a serious issue for by governments, organizations, and individuals due to the risk of attack (Different techniques for hiding the text information using text steganography [1,2]). To address this challenge, researchers have proposed various methods to protect secure messages transmitted via public and private communication channels.
The two essential methods that play significant roles in information security are data encryption and data hiding. Data encryption is an aspect of cryptography applied to protect the confidential message being transmitted across private and public channels by converting it to a scribbled enciphered form. Thus, the carrier object after encryption is meaningless. Meanwhile, information hiding conceals the secret message to make it unnoticed/invisible in the course of its transmission via the public (untrusted) communication channel [3]. Invisibility is the fundamental difference between cryptography and information hiding [4].
Information hiding can take one of two forms: Watermarking or steganography. Employing a watermarking to embed the secret information provides proof of ownership of the carrier object, so it is suitable for copyright protection [5]. Steganography conceals the existence of secret information in the cover carrier [6]. Steganography uses several classes of cover media (i.e., audio, video, image, text, network, and DNA).

Year
Reference Highlights Scope 2011 [15] Exhibits the performance analysis of the text steganography classes by analyzing the strengths and weaknesses. Text steganography 2016 [16] Classifies text steganography methods into 2 groups based on changes in format and meaning. However, it summarizes the proposed methods without providing comprehensive analysis.
Text steganography 2016 [17] Discusses the use of Genetic Algorithm (GA) in text steganography for avoiding suspicion. GA is widely used in image and video steganography compared to text steganography.
GA text steganography 2017 [18] Presents a taxonomy of the protection and verifying methods (watermarking, steganography, and cryptography) for integrating the Arabic text using the online Qur'anic content as a case study.
Text preserving and verifying 2017 [19] Classifies text steganography methods based on the embedding level into 3 levels: Bit-level, character-level, and mixed-level. Text steganography 2018 [20] This is a comparative study of structural methods in steganography and watermarking that are applied to copyright protection.
Text copyright protection 2018 [21] Discusses, in general, the 3 categories in text steganography: Format-based methods, random and statistical generation, and linguistics.
Text steganography 2018 [22] Provides the assessment of text steganography methods and discuss the current challenges. Text steganography 2019 [4] Presents an analysis of the security challenges and the pros and cons of structural text hiding methods. Structural text hiding 2020 [23] Addresses steganography methods' limitations and analyses their performance in each class, such as image, audio, and text video.
Steganography (image, audio, video, and text) 2021 [24] Focuses on the comparative analysis of text steganography methods in feature-based category.
Feature-based text steganography In this regard, this paper reviews text steganography, considering that it is a widely used steganography type. It focuses on analyzing Arabic text steganography methods in view of its propensity for information hiding, thus benefiting those who use Arabic text to embed confidential information on the public channel.
This effort is tailored at opening novel approaches useful in the exploitation of Arabic text as a cover for protecting sensitive information.
The contributions of this survey paper are summarized as follows: • It presents a brief review of existing linguistic text steganography methods. • It summarizes Arabic text steganography methods from their initiation while identifying their methodologies and analyzing their strengths and weaknesses. • It provides a comparative analysis of Arabic text steganography based on the key evaluation criteria (i.e., capacity, invisibility, robustness, and security). • It recommends future path work in Arabic text steganography.
The rest of this paper is organized as follows. Section 2 discusses steganography scenarios and types. Section 3 presents the background on text steganography, focusing on the language-based methods. The strengths and limitations of Arabic text steganography methodologies are discussed in Section 4, and Section 5 briefs the evaluation criteria for Arabic text steganography methods. Section 6 provides recommendations for future work. Finally, Section 7 concludes this paper.

Steganography
Steganography is the art of secret communication between confidential parties. It is the science in which the confidential message is embedded undetectably around the signal of the carrier so that no one except the sender and the intended recipient will be aware of the existence of the hidden data. The technical term steganography, derived from the Greek words steganos and graphein, means protected writing [25]. Therefore, a stenographic system facilitates data embedding in a discrete manner for easy access and data extraction, promotes a high capacity of embedding, and preferably includes some amount of resistance to removal [26]. Steganography allows the secret message to be exchanged without the knowledge or suspicions of the other parties. A successful attack on a stego object is the detection of the secret communication.

Steganography Scenario
We illustrate a typical steganography scenario in Figure 1. The entity responsible for sending the secret information (called the sender) applies a hiding technique to protect the secret message travelling through a public channel such as the Internet. The item type that contains the secret message could be as text, image, video, or audio. Similar to the secret message object, the cover could be an image, video, etc.
The embedding process needs a secret key (stego key) that protects the concealed message from being extracted by an attacker. After the embedding process, the stego object which represents the hidden message within the cover object is generated. Thereafter, the stego object is sent through the public channel to the receiver. If the stego key is designed as a private key, it will be sent to the receiver as a hidden key within the stego file.
Otherwise, a stego key will be produced as a public key, encrypted, and sent separately to the receiver via the public channel. The receiver can extract the secret message by exploiting the stego key and applying the extraction algorithm that corresponds to the embedding algorithm.
The risk appears when the hidden message is transmitted through the public channel where many attackers/eavesdroppers are ready to attack the stego object. The attacker extracts the hidden message by tracing the embedding algorithm and breaking the stego key. If an attacker is incapable of extracting the hidden message, he could tamper with the stego object to produce a tampered object by destroying the hidden message. Therefore, it is imperative to build a steganography method that achieves a tradeoff between four evaluation criteria: Invisibility, robustness, capacity, and security. The proposed steganography techniques seek to hide as much information into a cover object as possible without affecting its invisibility. The essence of invisibility is to prevent distortion of the cover object's appearance to ensure it remains unnoticed by the eavesdropper. Robustness prevents the attacker from either extracting or destroying the hidden message. If an attacker notices altered cover object, then he could break into the first shield of the defense. Then, he could break the robustness, which is considered the second shield of the defense. A steganography technique with a high level of perceptual and robustness will achieve a high stage of protection [4,27]. Hence, it is important to study and analyze steganography techniques and assess their performance using the four evaluation criteria mentioned earlier.

Steganography Types
The strength of steganography security is connected to the inability of observers to distinguish the cover object from the stego object. Cover objects can be customized with varied media types such as image, video, audio, text, network, and DNA.

Image Steganography
When the carrier file is an image, the stenography type is referred to as image steganography. Here, the image files (e.g., JPEG, GIF, BMP, and PNG) are utilized to cover the sensitive message. Image steganography is achieved by employing the image format, spatial domain steganography and adaptive steganography [28].
Recently, an image steganography approach used a data mapping technique to minimize the number of bits changed per pixel. Four hidden data bits were mapped to the four most important bits of a cover pixel [29].

Video Steganography
Combining image and sound, a video file such as MPEG, AVI, or MP4 carries the capability of hiding a massive amount of sensitive information. A computed tomography (CT) scan, which is applied for image steganography, can likewise be implemented for video steganography by embedding sensitive information in each image of the video [11,30,31]. Other commonly applied techniques for video steganography include the Least Significant Bit (LSB); Tri-way Pixel-Value Differencing (TPVD), which embeds the secret bits in the Inline frame (Iframe); and Bit Plane Complexity Segmentation (BPCS), which is also utilized for embedding secret bits within the MPEG video.

Audio Steganography
The file that saves digital sound (for example, MP3 or WAV) can be utilized to protect secret messages by shifting the binary sequence of that file. In various modern steganography methods, LSBs are changed with error diffusion. It is additionally conceivable to conceal secret messages using inaudible frequencies [30].

Text Steganography
Hiding sensitive information in a text file was the earliest means of transferring confidential messages. The intended receiver can only retrieve concealed data. Particularly, the text is ideal because it is a common object that is widely used in daily activities. This makes it difficult for the attacker to distinguish the hidden message [31]. Various methods have been introduced in this field. Section 3 provides further details on text steganography.

Network Steganography
In this type of steganography, a single network protocol is adjusted to embed the secret bits. Network protocols such as Transmission Control Protocol (TCP), Protocol Data Unit (PDU), User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP), and Internet Protocol (IP) are used as cover objects. Network steganography is profoundly secure and robust [32].

DNA Steganography
DNA steganography is characterized by the shortest computation time because it has less storage and power requirements. Conventional storage media require 1012 cubic nanometers to store 1 bit of data, while DNA memory stores data at a density of about 1 bit per cubic nanometer. No power is needed during the DNA computation [33,34].

Text Steganography
Centered on the embedding method used to conceal the sensitive information in the cover text, text steganography can be divided into three categories [35]: Random and Statistical Generation, Linguistic, and Format-based.

Random and Statistical Generation
This class generates a cover item based on statistical properties by considering word and character sequences. Sometimes, the created stego text attracts a person who intercepts the message by appearing as a random sequence of words/characters.
As an example in this category, the structure of the Omega network integrated with part-of-speech (POS) in [36] by substituting "verb from cover" with "verb from secret" and "noun of cover" by "noun of secret." Besides, letter frequency and word length were two of the statistical properties used by the authors of [37] to create a stego word using the actual dictionary items and a codebook of mappings between bit sequences and lexical items. Table 2 shows an example of Random and Statistical Generation. The stego words consist of the repetitions of the three letters 'a,' 'r,' and 'd' in an indecipherable way. As a result, the generated file in this approach is incomprehensible and raise suspicions. Table 2. Example of Random and Statical Generation (Data from [36]).

Linguistic Steganography
Linguistic steganography entails concealing confidential information by utilizing the language of words or other linguistic features. Linguistic methods comprise two groups: Syntactic and synonym. The syntactic method depends on the use of punctuation [38][39][40]. The synonym method has been used in the dictionary in place of the interactive word (by some carrier file words) to pass the hidden bits [31,41]. Table 3 displays an example of linguistic steganography. In this example, the secret bits are hidden by substituting the words using a dictionary. For instance, the word "Trap" has replaced by the word "Gun" to hide one secret bit. (We show the replaced words as underlined.) Table 3. Example of Linguistic Steganography (Data from [31]).

Secret Message Cover Text Stego Text
Keep the gun under the shed "Today is the first day of summer which starts with light and cozy sunshine. But eventually the sun becomes scorching and heat goes up. All the rivers and ponds become dried up. People used to wear light clothes and eat less spicy foods. Several summer camps are organized for kids in hilly areas. Trap shooting, swimming, trekking, rock climbing, biking also included as sports. One such popular summer camp is in Shimla. Kids used to leave their belongings and bed beneath the tent. " "Today is the first day of summer which starts with light and cozy sunshine. But eventually the sun becomes scorching and heat goes up. All the rivers and ponds become dried up. People used to wear light clothes and eat less spicy foods. Several summer camps are organized for kids in hilly areas. Gun shooting, swimming, trekking, rock climbing, biking also included as sports. One such popular summer camp is in Shimla. Kids used to keep their belongings and bed under the shed."

Format-Based Steganography
This group changes physical document formatting to cover secret information. Deliberate misspellings, font resizing, and space injection, among others, are examples of format-based methods used in text steganography. Although these format-based methods might trick the human eye, they cannot trick computer systems or extend the length of the stego text [42]. Furthermore, these methods are less robust against text retyping attacks [43]. The format-based category is divided into word-rule and feature-based methods as demonstrated in Figure 2. The word-rule involves two branches: Word shift-coding and line-shift coding [44][45][46]. The feature-based method is divided into language-based and letter-based methods. Table 4 exhibits an example of format-based steganography. In this case, the produced stego text is identical in appearance to the cover text. The alphabets are grouped into two categories: Round shape and curve shape. In each class, the letters are divided into two groups. A letter can represent two secret bits based on its group.  [37]).

110
"All birds can fly. This is a bird. Ostrich can also fly." "All birds can fly. This is a bird. Ostrich can also fly." A feature-based method manipulates the shape, size, and position that relates to the features and structures of the text font. This method prevents the reader from recognizing the secret message or information in the text [47]. Table 5 summarizes the differences between the three categories using examples described by the authors of [33,38,39]. Table 5. The main differences between the 3 categories.

Random and Statistical Generation
This method is not based on a specific text. However, the generated text is meaningless and raise suspicions. The computational time is also increased.

Linguistic steganography
The invisibility is improved, but the method still suffers from low capacity. Also, searching through a dictionary for a suitable word/letter to match the secret word/letter increases the computational time.
Format-based This method improves invisibility and computational time. Nevertheless, it suffers from low embedding capacity.
The characteristics of the feature-based method have encouraged its use by researchers studying languages all over the world. For example, the letter-based method uses the alphabets A to Z, which can be adopted in many languages. The authors of [37,[48][49][50] have studied the feature-based method, which can be operated in any language, either with figures or alphabets. Examples of feature-based embedded methods applied in several languages are reviewed below.

English-Based
The modification of a written status of the mark-up letter to hide the secret message was introduced by the authors of [51]. This was exploited to analyze the concealed secret information in hypertext. Mark-up letters determine the secret bits used to reveal the length of hidden messages. Machine translation was employed by the authors of [52] to hide a secret data. This embedded method translates the transmitted text and allows the source to be kept in its original form. A code representation method known as secret steganography code was proposed [53], which employs the positions of vowels and consonants according to the grammatical sequence.
The authors of [54] used right-to-left and left-to-right remark to conceal information. This embedded method hides the secret data/message without changing the file's information. It also avoids the retyping problem by converting the file into PDF format. Encryption with Cover text and Reordering (ECR) was proposed [55], which uses XOR operation. It merges two characters when enciphered in the original message. Because the suggested mechanism considers encryption and reordering processes, it is convenient to implement cloud computing.
The algorithm described by the authors of [40] utilizes several invisible character symbols for covering 4 bits between alphabets in word symbols such as left remark, right remark, and zero-width joiner. The algorithm can only be applied to specific languages, hence there is a need to extend its embedded method to be applied in any language. The concept of utilizing the font attributes and character frequency to embed the secret characters was presented by the authors of [56]. To accomplish the uniform appropriation in stego characters with the uniform hiding likelihood, this method integrates four models: Frequency Normalization Set (FNS), Character String Mapping (CSM), embedding, and extracting.
Text justification was considered by the authors of [7] by justifying the cover text's host line based on the character's frequency in the confidential message. This method was deployed in both electronic and printed details. In the same line, the concealed method introduced by the authors of [57] changes the length per line in the text document to embed the secret bits, which are covered using white space between words and an extended line in the cover text.

Chinese-Based
The method suggested by the authors of [58] hides secret bits into characters by rearranging the sizes of the rectangular regions' components in the Chinese alphabets. In the same field, two embedding methods have been presented: The high efficient substitution embedded method (HESM) and the simple substitution embedded method (SSM). To hide a secret bit, SSM changes the traditional form of Chines characters, while HESM uses a substitution dictionary.

Indian-Based
In Hindi script, a specific matra is media vowel representation. This method was used by the authors of [59] to cover a hidden bit by shifting it to left or right. The authors of [60] integrated two hiding algorithms for the Hindi language. The first algorithm involved the existence of letters and their diacritics and compounds. The second proposes a numerical code for Hindi letters, which is based on a 4-bit binary.
Later, the vowels and consonants of the Hindi alphabets are encoded to a specific numerical code based on four binary bits representation. For the Indian language, the substitution hiding method introduced by the authors of [61] uses the longest common subsequence with minimal alteration of the alphabet features. The numerical code text steganography in Hindi character or other similar Indian languages was developed by the authors of [62].
In addition to numerical, a feature-based embedded method in Hindi text that uses grammar was developed by the authors of [63]. This method encodes a bit stream with the Finite State Machine (FSM) to define transition functions and transformable symbols in each category. Like the Hindi language, the use of feature scripts for Bangla text using chain code has been proposed [64]. The chain code is used to translate codes into several signified contour border pixel directions. This approach presents the use of 50 feature vectors of Bangla alphabets or characters.

Polish-Based
Utilizing Polish text to cover the hidden bits has been suggested [65]. This approach assigns points that are greater than the text alphabets' partial sizes. Polish extension characters are employed alongside the alphabets to join certain alphabets clutching the cloistered secret bits.

Thai-Based
The blind steganography strategy was proposed by the authors of [66] for the Thai text exploits redundancies in the way TIS-620 signifies compound alphabets, merging vowels and diacritical symbols.

Czech-Based
The authors of [67] utilized the dot (point) in the Czech language. Additionally, Czech extension characters were employed in the cover letter to indicate the positions of hidden bits. Table 6 shows the main characteristics of format-based methods in each language. In the next section, Arabic text steganography is extensively reviewed. Table 6. Fundamental characteristics of format-based text steganography.

Language-Based Characteristics
English Text justification, mark-up language in hypertext, font attributes, substitution, and invisible character are the main characteristics used to hide a secret message in English scripts. Most of these methods can be applied to other languages.

Chinese
The main characteristics employed to protect a secret information in Chinese scripts are rearranging the sizes of the rectangular regions' components in the Chinese alphabets and substitution. This method is language-specific, i.e., it is not applicable to other languages.

Indian
Matra, vowels, and substituting are characteristics that have been used to hide the secret message in Indian/Hindi/Bangla scripts. Polish The dot and extension in the script have been exploited to hide the confidential message. It can be applied to the Polish, Czech, Arabic, Urdu, Jawi, and Persian language.

Thai
The redundancies of alphabet merging with vowel letter and diacritics in Thai scripts used to hide the secret message. This method is not applicable to other languages.

Arabic Text Steganography
Arabic, spoken by approximately 380 million people [68], is the fifth most spoken language in the world [69,70] and the sixth official language of the United Nations [71]. Arabic online content expands during daily activities on the Internet [72]. Arabic is composed of 28 characters that are written in a cursive style similar to Urdu and Farsi. Depending on its place in a word, an Arabic letter changes shape. It may come in the first, middle, or last position or may even be isolated. Each word usually comprises over two letters joined together. Some Arabic letters have one, two, or three dots placed either above or below the letter. In contrast to English, which has no multipoint letters, Arabic has 15 pointed letters, 5 of which are multipoint. The translation of Arabic letters is shown in Table 7.
Arabic words have diacritics called "Harakat" that are added to frame the vowel sounds. The eight Arabic content diacritics are Fathah ( ), Kasrah ( ), Damah ( ), Sukun ( ), Tanwin Fathah ( ), Tanwin Kasrah ( ), Tanwin Damah ( ), and Shaddah ( ). These diacritics are essential for understanding the Holy Quran, religious scripts, historical texts, and Arabic learning books. However, most other Arabic text does not contain diacritics. The Arabic text also contains an extension character called kashida, which is used to justify the words, as well as white spaces, which justify the texts. Kashida is inserted after a letter based on its location in a word [73]. Arabic letters can also be divided into 2 groups, i.e., the sun and moon letters, where each group contains 14 letters, as shown in Table 8. This grouping is based on how these letters affect the pronunciation of the definite article ( ) at the beginning of words. The sound of ( ) in the definite article appears in the moon letters and does not appear in the sun letters.  The aforementioned features make the Arabic text more appropriate for hiding secret information. The various methods that have been employed in the literature are the dot method, diacritics, kashida, Unicode, sharp-edges, poetry, and hybrid methods. Each method is examined below.

Dot Method
Some early studies have used the points in the Arabic and Persian letters for hiding confidential information. For illustration, the authors of [75] hid one secret bit (0 or 1) within Arabic letters by shifting the dots. The secret message was converted to the bitstream, which was then compressed to reduce the bitstream's length. The cover text was scanned letter by letter to identify the pointed letter. Whenever a doted letter was identified, its dot was slightly shifted upward if the mystery bit was "1." Otherwise, the point was unchanged if it was "0." Figure 3 shows an example of the Arabic letter "Noon." Another study was carried out by the authors of [76] for Arabic letters with more than one point. Every multipoint alphabet was dealt with in two bits in the proposed study. The embedded process, combined with vertical point shifting, doubled the number of hidden bits. A challenging problem associated with this method is the retyping process, which destroys all the concealed bits. The authors suggested a solution to this issue that would restrict the number of new font format changes in the future. This was accomplished by merging all the data into a single file.
This approach assumes that the shifting point improves the capacity using the traditional points in Arabic letters and decreasing the hidden information's suspicion in the covert text. However, this approach is characterized by higher running time. Also, it has a fixed output format, and the secret message is vulnerable to retyping or scanning. Table 9 summarizes the dot method. Table 9. A summary of the reviewed articles on the dot method.

Authors
Methodology Pros Cons [75] A pointed character moves its point to conceal "1" and remains untouched to hide "0." Improves the robustness by changing the remaining characters randomly. Enhances the capacity by compression.
High computational time. Stego text is fixed for only 1 font type. [76] A pointed character shifts its point and increases the distance between its dots to hide 2 secret bits in one character.
Converts stego file (text) to image file to overcome the retyping challenge.

Diacritic Method
The Arabic language uses varying marks or diacritics (Arabic redundant characters) known as harakats to represent vowel sounds. Using diacritics for security purposes is beneficial because diacritics exist naturally as a fundamental characteristic of Arabic language scripts [77,78]. Diacritics are used to differentiate between words with the same alphabets so that each word is pronounced differently, as explained in Table 10. Fathah is used to hide the bit '1 while the rest of the diacritics embed a 0 bit. This is because Fathah accounts for almost half of the diacritics' usage in Arabic texts. This approach has the flaw of attracting the reader's attention.
The early method presented by the authors of [79] exploits eight varying diacritical symbols to conceal mystery message. Fully diacritic Arabic texts are utilized as cover media. The first bit of a secret message is compared with the initial diacritic in the cover media. For instance, if the first secret bit is 1 and the initial diacritic is a Fathah, the diacritic remains on the cover media. Then, the index is incremented for both the cover media and the embedded text. If the first diacritic is not a Fathah, it is taken out of the cover media. A repetition of the approach is done until the next Fathah is realized. A secret 0 bit is used or embedded in a similar approach for the remaining seven diacritics (i.e., with the exemption of the Fathah).
In the same line, the uathors of [80] changed the diacritic's font style to cover the secret data. A new font style set was designed to embed "1" or unset to embed "0." The idea involved two approaches: The textual approach and the image approach. The textual approach chooses a font that hides extra (or all) diacritic marks completely. It then uses any encoding scenario to conceal secret bits in an arbitrary number of repeated but invisible diacritics. On the other hand, the image approach selects one of the fonts that slightly darkens multiple occurrences of diacritics. This approach needs to convert the document into a picture form to facilitate printing.
Two steganography algorithms for Arabic script were presented by the authors of [81]. The algorithms were designed based on the wasting/nonwasting property of the Arabic diacritics. In the first algorithm, a fixed-sized block parsing is used. A stream of binary bits is parsed into cover blocks. The second algorithm uses the variable size content-based approach. Here, binary data is parsed into an integer number of blocks irrespective of the number of bits they possess. These algorithms have different properties and are thereby suited for various application types as well as steganography requirements (i.e., robustness, file size, and capacity). In contrast to the content-based algorithm, the fixed size algorithm permits a straightforward computation of the required quantity of cover text. Still, it cannot instantly predict the output's file size.
Concealing Chinese text inside Arabic text was introduced by the authors of [82]. Characters of messages are automatically converted to capital letters of the English alphabet. Letters, numbers, or special characters can be hidden using two diacritics. In this case, the Unicode used warrants that each letter or diacritic is 16 bits in length. Thus, two tables (diacritics and elements tables) are used. The diacritic table has 64 inputs and 8 different diacritics, of which 2 diacritics carry 1 element. The element table is stored as a one-dimensional array and contains all the English alphabets and numbers 0 to 9.
The authors of [83] described an embedded method for hiding information in vocalized Arabic text. The method uses fully diacritic text, and if the secret bit is "1," then the diacritic is presented as it is. Otherwise (if it is "0"), the diacritic is removed.
On Arabic and Urdu text, the authors of [84] employed reversed Fathah to represent the document's concealed message. From the article written in the Arabic language, the hidden message was read and matched by character to the cover article. Then, the reversed Fathah was embedded in varied lines. The disadvantage of this method is that the text can be lost during retyping, and only one font possesses a static frame. However, this method can be applied to other similar scripts, such as Urdu. Perhaps its use can be considered in Asian scripts.
The shifting of harakat was considered by the authors of [85]. The authors applied vertical shifting by 1/200 inches to hide "1" and no change to hide "0." Showing or omitting diacritics have been used as techniques used to hide the secret bit [86]. Three embedding algorithms were developed: The Basic algorithm, Switch algorithm, and Parity algorithm. The Basic algorithm shows a diacritic to hide the secret bit "1" while it omits a diacritic to hide "0." In the Switch algorithm, a diacritic is shown just when there is a change in the secret bits from "1" to "0" and vice versa in the secret bitstream sequence. The Parity algorithm sets a parity bit to every cover character in the text. If the cover character's position is an even number, then the parity bit of this character is "0." Otherwise, it is "1." Two diacritics (Kasrah and Fathah) were utilized to design an embedding algorithm in [87] by fragmenting the hidden message into two arrays of binary values, forming odd and even lists. The general idea is that the odd array list is hidden in the Fathah diacritics while the even array list is concealed in the Kasrah diacritics. The first odd bit of the hidden message is read by the program and compared with the initial Fathah in the cover text. For instance, if the initial odd bit to be concealed is "1 ," the initial Fathah will not be touched. Otherwise, if the initial odd bit to be concealed is not "1," the Fathah will be removed.
Recently, a modified Fathah in Arabic text steganography was presented by the authors of [88]. First, the secret message was encrypted with the AES algorithm. Then, text steganography with modified Fathah was used to hide the encrypted data. The modified Fathah lies in the same direction as the original Fathah, slightly oriented to be like the original to avoid suspicions.
The discussed diacritics method is summarized in Table 10. It can be concluded that most of the diacritics methods serve to enhance capacity. This is attributed to the benefits of diacritics' natural presence as historical characteristics of the Arabic language that originated for representing vowel sounds [78,86]. Nevertheless, diacritic methods increase suspicion since diacritics appear abnormally. Moreover, most of the Arabic scripts nowadays have no diacritics. Table 10. A summary of the reviewed articles on the diacritics method.

Authors
Methodology Pros Cons [80] Multiple embedding scenarios are achieved by changing the font style of diacritics. It considers repeated but invisible diacritics.

Low computational time.
Embedding is automated or manual. Improved invisibility.
Stego text is fixed for the use of only 1 font type. [82] Hides each character in 2 diacritics.
Improved security using RLE. Low computational time. Embedding is automated or manual. Stego file has a flexible format.
[79] The existence of Fathah hides "1," and the other diacritics hide "0." Low computational time. Embedding is automated or manual. Stego file has a flexible format.
The stego text size is different from the cover text and raises suspicions.
Improved security using AES.

Kashida Method
Kashida refers to a type of justification, i.e., a stretch or extension of Arabic letters. It is used for various purposes such as emphasis, legibility, aesthetic, and justification [89]. In this steganography method, the extension (kashida) is added to words to represent the secret bit "1." When it is not added, it represents the secret bit "0." It is worth noting that alphabet extensions do not affect the writing content or the message content. Although the sentence in the output text still has the same meaning as the cover text, i.e., "It is from the excellence of (a believer's) Islam that he should shun that which is of no concern to him.", the appearance of the text changes and increases the file size. Thus, it may capture the reader's attention.
The Kashida method's established state, which protects the secret bit in any letter, was performed by the authors of [90]. It needs the pointed letters with extension to hide secret bit "1" and the unpointed letters with extension to cover secret bit "0." This method does not have any effect on the written content, as illustrated in Table 11. The improved work by the authors of [90], described in [91], involved injecting one kashida to hides "0" and employing two consecutive kashidas to conceal "1." Table 11. A steganography example that adds extensions after letters (Adapted from [91]).

Watermarking Bits 110010
Cover-text Output text Building on this method, a stego system for Arabic e-text, Maximising Steganography Capacity Using Kashida in Arabic Text (MSCUKAT), was developed [92]. At the same time, the algorithm proposed by the authors of [77] hides the secret message as numbers by inserting kashidas. Each extendable letter can hide a specific number based on the position and the number of kashida in a word. Later, the implementation of MSCUKAT was produced by the authors of [93].
The algorithm proposed by the authors of [73] considers four scenarios where kashida letters can be added. Techniques are employed at random for selecting one of the four scenarios in each round. Then, message segmentation principles enable the sender to select over one strategy for each message block.
Similar to the method described by the authors of [73], four embedding schemes were designed by the authors of [94] to hide two secret bits. The suggested design utilizes the existence of kashida after a pointed or unpointed letter to hide the secret bits.
Next, the authors of [95] compressed secret messages using Gzip and encrypted these compressed secret messages by deploying AES. The proposed embedding method involves four stego options: Pointed kashida (After Letters), pointed kashida (Before Letters), pointed kashida (Mixed Letters), and MSCUKAT.
Another work [96] hid a voice file into a text file using kashida and the word "La." The proposed embedding algorithm reduces the size of the secret voice using the Loss-Less compression algorithm. It then hides "1" by inserting kashida after the letters, while "0" is hidden by leaving the letters without kashida insertion.
Using the sun and moon letters, a technique proposed by the authors of [67] protects a secret bit in Arabic script. The technique considers four different scenarios. In the first, a kashida is placed next to a sun letter to conceal the confidential bits "00." The second scenario covers the sensitive bits "11" by inserting two kashidas after a sun letter. A kashida is added after a moon letter in the third scenario to embed the secret bits "01," and two kashidas are included in the fourth scenario to conceal the secret bits "10." As kashida is frequently used in Arabic text, the utilization of kashida for steganography (Table 12) is one way of improving the embedding capacity of hidden information. Nevertheless, these studies still have drawbacks, such as high imperceptibility to suspicion and large output file size. Also, there are only a few attempts to reduce the algorithm's complexity for improving the extraction of hidden information.

Unicode Method
Unicode is an international character encoding format for displaying text for data processing. This standard is compatible with ISO/IEC 10646-1:2000 version 2. ISO/IEC 10646 has the same characters and codes. Unicode allows the encoding of all characters used in the world's writing systems. This standard employs 16-bit encoding, which allows for a total of 65,000 characters. This implies that it is possible to specify and define 65,000 characters in different modes such as numbers, letters, and symbols in various languages. Furthermore, due to the vast amount of space devoted to characters, this standard contains the majority of the symbols needed for high-quality typesetting. The languages whose writing systems can be supported by this standard are Latin (covering most of the European languages), Cyrillic (Russian and Serbian), Greek, Arabic (including Arabic, Persian, Urdu, Kurdish), Hebrew, Indian, Armenian, Assyrian, Chinese, Katakana, Hiragana (Japanese), and Hangeul (Korean). This standard also includes several mathematical and technical symbols, punctuation marks, arrows, and other marks. The Unicode standard consists of two groups of codes for the Arabic alphabets. The first is the representative code, and the second is the code of the letter's possible shapes. Separate characters are allocated for Persian letters with semantics or shapes that are significantly different from Arabic letters despite the unification of codes with common characters. This implies that separate places have been allocated to Persian special letters ( ) and two other Persian letters ( , ) that are different from their corresponding Arabic letters in terms of appearance.
The Unicode approach utilizes the various possible Unicode values of the same alphabet to conceal the bits. It is suitable for use on the public channel and modern devices such as smartphones. Pointed letters with kashida hide "1." Unpointed letters with kashida hide "0." Embedding is automated or manual. No size increase of stego text.
Limited capacity since all letters cannot be extended. [91] Uses 1 kashida hide "0" and 2 consecutive kashidas to hide "1." Increase in the size of stego file.
[73] Randomly applies kashida insertion in 4 scenarios to hide a secret bit.
Increases the algorithm complexity and reduces the likelihood of suspicions. [95] Uses 4 choices to protect the secret bits based on kashida and dotted letter.
Improves security using AES. Improves the capacity using Gzip. Stego file has a flexible font and format. Low computational time. [96] Compresses the secret message then inserts extra kashida after each letter and "La" word to hide "1." It leaves the letter with the original kashida to hide "0." Reduces the size of secret bits using Loss-Less compression algorithm. [94] Kashida-based insertion in 4 scenarios while considering pointed and unpointed letters for hiding 2 secret bits.
Improves the capacity using 1 character to hide 2 secret bits.
High computational time.
The authors of [97] proposed the usage of Unicode characters by inserting a normal space after pseudo-pace to embed "1" and no insertion to embed "0." In that same year, the authors of [98] presented a design that utilizes "La" to hide the secret message. The word has two forms in Arabic writing: Normal form and special form. The Unicode of the normal form is used to conceal "0," and the special form conceals "1." Later, another Unicode technique was suggested where each Persian or Arabic letter has one unique code [99]. This code displays the letter in an isolated form and acts as a representative for the word. For each word in the text, it is possible to save a letter using the representative letter or the code of its correct shape (with respect to its position in the word). For hiding 0 bits in the word, the first option is used to save the word.
Similarly, for hiding 1 bit, the second option is used. The authors of [100] applied the similarity between Arabic and Persian characters « » and « » to hide the bit "0." The Arabic characters « » and « » were applied to hide bit "1." One approach used the isolated letters in Arabic text with Run Length Encoding (RLE) to embed the secret bits [101], where the secret bit streams are converted to groups of 0s and 1s by applying RLE. The Unicode characters related to the isolated letters are changed to embed the secret bit "1" or unchanged to embed the secret bit "0." Likewise, the isolated letters in Urdu text [102] changed the Unicode character to hide the secret bits. The secret message is encrypted as an encipher text by applying RSA. The enciphered text is converted to bitstream and divided into even blocks. Randomization and swapped functions are then applied to these blocks. The Unicode equivalents of the isolated letters are changed to cover bit "1" in each block or unchanged to cover "0." Another Unicode system that involves a modified RLE was presented by the authors of [78]. This system uses a coding method with an output that carries a sequence of 1s and a few 0s. The modified RLE proposed in the system is suitable for compression. The outputs suit steganography purposes that use Unicode and unprinted characters to hide the secret message in an Arabic text.
Three scenarios were studied by the authors of [103]. The first reduces the character change by counting the number of 0s. If there are fewer 0s than 1s, the secret packet is complemented with 1s. The second scenario hides 0 bits by leaving a cover letter unchanged from the next word in the text, while the third hides 1 bit by identifying a cover letter from the text. According to the type of letter, this letter's Unicode must then be modified from the general letter's Unicode to the contextual Unicode. The cover letter's Unicode is changed to the isolated form if it belongs to an isolated group. If the cover letter is part of a series, the Unicode is reverted to the original form.
As seen in Table 13, high perceptual transparency and the unaffected format size and output file are the major benefits of Unicode methods. The Arabic letters take different shapes in different positions. For this reason, an inappropriate change of the letter's shape increases the reader's suspicion, hence limiting the cover letters. Table 13. A summary of the reviewed articles on the Unicode method.

Authors
Methodology Pros Cons [97] Inserts normal space after pseudo space to hide "1." No insertion hides "0." Stego file has a flexible font and format. Low computational time.
Increase the stego file size.

Deficient capacity as the limited identical isolate letter between Arabic and
Persian. [100] The Arabic character « » or « » hides "1" and the Persian character « » or « » hides "0." Low capacity due to the limited use of « » and « » [98] The special form of the word "La" hides "1" and the normal form hides 0.
Embedding is automated or manual. Stego file has a flexible font and format. Low computational time.
Very low capacity because of the poor existence of La word. [101] Changes the Unicode of Arabic isolated letter to hide "1" and leaves it unchanged to hide 0.
Improves security using RLE. Stego file has a flexible font and format. Low computational time.
Low capacity due to the limited appearance of Arabic isolated letter. [102] Hides the secret bits based on Unicode and non-printed characters.
Improves security using RSA. Stego file has a flexible font and format. Low computational time.
Low capacity due to the limited appearance of Urdu isolated letter. [78] Changes the Unicode of isolated Urdu letter to hide "1" and leaves it unchanged to hide 0.
Improves the security using RLE. Stego file has a flexible font and format. Low computational time.
Limitation in capacity by considering only unpointed letters.
[103] Uses 3 scenarios to hide secret bit by changing the Unicode of the letter.
Stego text size is not changed.
Limitation in capacity by considering only isolated and initial letters.

Sharp Edges Approach
The algorithms suggested by the authors of [104] involve using Arabic letters' sharp edges to hide the secret bits. Each letter hides secret bits based on the number of its sharp edges, as shown in Figure 4. For example, a letter with one sharp edge is probable to embed one secret bit 0 or 1. The authors of [104] designed a reference table to keep the locations of secret bits in the cover letter. The approach described by the authors of [105] operates on dotted and undotted alphabets. Random numbers are generated and used to assign alphabets sufficient for hiding 104 bits of secret message. This results in the following alphabets: . The number of sharp edges on the initial alphabet, as shown in in Table 14, determines the number of bits that will be hidden. The secret bit that corresponds to this number is included in the code sequence, and the process continues until all the bits are embedded. For instance, the character ( ) has two sharp edges. Hence, it can carry the first two binary bits (i.e., 01) and represent them in the corresponding decimal unit, which is 1.

Letter
The sharp edges with dots and typographical proportion of Arabic letters were presented by the authors of [106]. The presented algorithm, called the Primitive Structural algorithm, gives each letter more than one potential position to embed the secret bits, as shown in Table 15. At the same time, each letter carries more hidden bits than the method in [106]. From Table 16 it can be summarized that the serious limitation of the sharp edges method is security. Thus, it requires further security layers to secure the indications of the concealed bits in the stego file. Nevertheless, the sharp edges method achieves better embedding capacity.

Authors
Methodology Pros Cons [104] Each sharp edge in character embeds one bit, "0" or "1." Generates reference table for secret bit's place.
High capacity because all characters can hide bits based on their sharp edges. Stego file has a flexible font and format.
Low security as additional security layers are needed to protect the reference table or code sequence. [105] The number of sharp edges in each character is utilized to protect the same number of secret bits by converting it to decimal. It generates a code sequence of decimal numbers. [106] Each sharp edge, dot, and typographical proportion can hide "0" or "1."

Poetry Approach
The Arabic poetry system is designed to be operated in text hiding [107]. Since there is a representation of binary units embedded in each Arabic poem, poems can be utilized to hide secret bits. The key idea here is to presume that the embedded binary bits position in poems contain secret bits. The real secret bit is either equivalent to the binary position or equivalent to its reverse. To increase the capacity of the introduced embedded technique, diacritics and kashida approaches have been utilized. Table 17 shows an example of Arabic poetry steganography. or equivalent to its reverse. To increase the capacity of the introduced embedded technique, diacritics and kashida approaches have been utilized. Table 17 shows an example of Arabic poetry steganography.

Its classification
Al-Taweel meter Table 18 shows that the poetry method improved the embedding capacity. However, it is applicable only for Windows-1256.

Author
Methodology Pros Cons [108] This method represented the poetry meters into binary representation to hide the secret bits.
Improved the embedding capacity Only used of Windows-1256 for the encoding.

Hybrid Approach
A combined or hybrid method involves the integration of two or more text steganography methods. The earliest proposal in this method [109] merges two methods: The Unicode method (whitespace) and the kashida method. The integrated technique embeds secret bit "1" by inserting whitespace. Before moving to the next word, it adds two consecutive whitespaces between words to hide "1." In the case of secret bit "0," there is no addition of kashida and whitespace.
Later, merging Unicode with diacritics was suggested [110] to hide the confidential message. This method employs RNA to encode the secret messages, while non-printed characters are used to conceal these codes. Compression is applied by modifying the Run Length Encoding (RLE) compression algorithm to overcome its limitation.
Similarly, the authors of [111] compressed secret messages using Gzip and encrypted these compressed secret messages by deploying AES. The embedding method employs two stego options of "kashida" and changes the Unicode of the letter based on the proposed blood group algorithm's behavior.
Next, kashida and diacritics were combined [112] to cover the mystery message. The Table 18 shows that the poetry method improved the embedding capacity. However, it is applicable only for Windows-1256. Table 18. A summary of the reviewed article on the poetry approach.

Author
Methodology Pros Cons [107] This method represented the poetry meters into binary representation to hide the secret bits.

Improved the embedding capacity
Only used of Windows-1256 for the encoding.

Hybrid Approach
A combined or hybrid method involves the integration of two or more text steganography methods. The earliest proposal in this method [108] merges two methods: The Unicode method (whitespace) and the kashida method. The integrated technique embeds secret bit "1" by inserting whitespace. Before moving to the next word, it adds two consecutive whitespaces between words to hide "1." In the case of secret bit "0," there is no addition of kashida and whitespace.
Later, merging Unicode with diacritics was suggested [109] to hide the confidential message. This method employs RNA to encode the secret messages, while non-printed characters are used to conceal these codes. Compression is applied by modifying the Run Length Encoding (RLE) compression algorithm to overcome its limitation.
Similarly, the authors of [110] compressed secret messages using Gzip and encrypted these compressed secret messages by deploying AES. The embedding method employs two stego options of "kashida" and changes the Unicode of the letter based on the proposed blood group algorithm's behavior.
Next, kashida and diacritics were combined [111] to cover the mystery message. The embedding algorithm conceals one part by adding Fathah, and the rest hides "0." The other part adds two consecutive kashidas to hide "1" and one kashida to hide "0." Again, kashida and Unicode methods were used by the authors of [112] to cover the confidential information. For the Unicode method, three small spaces (thin, hair, and Six-PRE-EM) were utilized. The presented scheme grouped the bitstream into 4 bits each. The first bit indicates kashida, where it inserts kashida to hide "1" and considers an existing kashida to hide "0." The second bit indicates thin space, the third shows hair space, and the last bit indicates Six-PRE-EM. The existence of the three small spaces hides "1" while their absence hides "0." The merger of counting-based secret sharing and kashida that was recently presented by the authors of [113] hides the secret sharing bits within Arabic text using kashida. Recently, Medium Mathematical Spaces (MSPs), ZWJ, JWJN and kashida were united [114] to protect the secret bits. The study hid one secret bit by changing format or including one whitespace or kashida.
It can be observed from Table 19 that hybrid Arabic text steganography methods, using kashida and several kinds of spaces to hide the secret bit, have recently received more attention. The main advantages of merging two or more steganography methods include a higher capacity for hidden information and a lower suspicion level. However, the hybrid scheme inherits drawbacks from its composite methods, such as the destruction of hidden information by retyping or deduction by Optical Character Recognition (OCR). Table 19. A summary of the reviewed articles on the hybrid approach.

Authors
Methodology Pros Cons [108] Adds kashida and consecutive whitespace to hide "1" and single normal space to hide "0." Slight improved in capacity using whitespace.
Stego file increases by inserting kashida and whitespace. [109] Hide 1 secret bit by changing the Unicode of the unpointed isolated letter and add diacritics.
Improves security using RNA. Stego file has a flexible font and format. Low computational time.
Used only the unpointed letters. [110] Different scenario merges kashida and Unicode methods based on blood group behavior.
Improves security using AES. Enhances the capacity by increasing the usable characters and using a compression algorithm (Gzip).
Stego file size increases by inserting kashida. [111] Inserts kashida or fathah to hide "1"; the rest to hide "0." Improves the capacity by increasing the embedding characters.
Stego file size increases by inserting kashida and diacritics. Suspicion is raised because the included diacritics are not in the proper place. [112] Integrates kashida with 3 small spaces (Thin, Hair and Six-PRE-EM) to hide secret bits.
Inserting kashida and whitespaces are controlled, which enhance the capacity while maintaining invisibility.
Stego file size increases by inserting kashida and whitespace. [113] Hybrid kashida with secret sharing.
Improves security using secret sharing.
Stego file size increases by inserting kashida. [114] Multi types of whitespace are combined with kashida.
Improves the capacity by increasing the usable characters.
Stego file size increases by inserting kashida and whitespace. The use of Arabic text steganography methods is illustrated in Figure 5, which demonstrates that the Unicode method has attracted the most significant interest among researchers due to its transparency. Besides, the diacritics methods suffered from several limitations, mainly increased suspicion (Low invisibility). The diacritics are exploited remarkably by either showing some and hiding the others or adding it in impropriate positions.

Evaluation Criteria
Four evaluation criteria must be considered when a researcher develops a text steganography [4], i.e., capacity, invisibility, robustness, and security, where some of these criteria can be evaluated by calculation while others can be visualized [115]. High embedding performance is achieved by making a tradeoff between these criteria [116]. Most text steganography methods focus on increasing the embedding capacity. It is worth noting that high embedding capacity affects the invisibility of the stego text. This consequently affects the text steganography method's security, especially when its security depends on properties such as invisibility and robustness.

Embedding Capacity
The amount of hidden information in the cover text is called its embedding capacity [117]. This criterion is calculated by applying the following equation.
Embedding Capacity = Secret bits Cover bits × 100 Text steganography methods can improve the embedding capacity by increasing the embeddable/usable cover characters (embeddable positions) [56,118], bits per location [115], and compression techniques [43], as well as merging more than one text feature [114]. The embeddable character/location refers to a character/location with the ability to use in the embedding process. Bits per location indicates the hiding amount per location.
Arabic text has many features such as kashida, dots, diacritics, and so on, increasing the capacity by merging more than one feature. Besides, compressing the secret bits using compression algorithms reduces the hidden bits amount. This paper evaluates the capacity of Arabic text steganography methods by analyzing the bit per location and the compression techniques used.

Invisibility
Steganography hides the secret message in the cover media without making perceptible or visible distortions on the cover item [119]. Thus, the hidden message is not detectable by the attacker. Some researchers have not considered imperceptibility to be a basic requirement of steganography [120]. However, most other researchers have emphasized imperceptibility as one of the primary goals to protect the hidden message [17,43,56,[121][122][123][124][125][126] ("protection by invisibility"). We are inclined to take the latter perspective-invisibility is the key properties to prevent the attacker from detecting and hence deducting the hidden message [127][128][129][130].
According to the authors of [131], a stego file can be attacked in two ways: A visual attack and statistical attack. The visual attack uses the human vision to detect any abnormal appearance on the object or distinguish the differences between the original object from the stego object, whereas the statistical attack analyzes the item using steganalysis algorithms based on mathematical theories [132,133].
Some researchers view perceptible modification on the cover media as a disadvantage. The original cover media is not secret and is available for the public. One way to detect the hidden message is to check the similarity between the original cover and the stego file. The similarity depends on the type of the cover. Accordingly, the unseen change in the cover text cannot be evaluated by calculations or numeric computation. This is because the human vision is different from one human to another. Nevertheless, the difference between two texts can be measured using Jaro-Winkler distance [134,135] by checking the size, semantic, and lexical of the texts. Thus, any imperceptible modification can be detected using this parameter. Therefore, a successful text steganography should achieve a high similarity between the original and the cover file. However, Jaro-Winkler distance has a limitation in font attributes.
Invisibility can be divided into two types: Similarity and ambiguity. Similarity is achieved when the two texts being compared are identical in size, format, semantic, and lexical. It can only be measured in the availability of both the cover and stego text. It is assessed in three levels (low, medium, high) that illustrate how close the two documents are. Ambiguity arises when a word/diacritic is not in the text's context and when inserting multiple kashidas in irregular manner. It can be evaluated when the attack is found only the stego text and not in the cover text. Ambiguity can be assessed in three levels (low, medium, high) that demonstrate the extent of attracting the eavesdropper's attention. This study analyzes invisibility by studying the examples of cover and stego texts, which are given in the mentioned methods in terms of similarity and ambiguity.

Robustness
Robustness in steganography is the ability of the hidden data to withstand attacks, such as rotation, cropping, added noise, compression, and so on [120]. Considering that a vast amount of text messages and contents are transmitted over and posted on the Internet, robustness is now becoming more relevant. This has also been noted by many researchers [17,119,122,[136][137][138][139]. Moreover, tampering attack is the most common type of text attack [2,4,18,44]. It can take many forms, such as insertion/deletion, copy/paste, font format, printing, and retyping [8,115]. Besides, attackers can use OCR to identify different characters from a record picture. Additionally, a tampering attack provides full alphanumeric recognition of printed or handwritten characters, numerical letters, and symbols into a computer processable layout that includes ASCII and Unicode [140].

Security
Security in steganography conceals a high amount of secret information whilst maintaining the invisibility and robustness [4]. The proposed text steganography algorithm must prevent the attacker from visually detecting the hidden information, destroying it by tampering or extracting it by breaking (understanding) the embedding algorithm. Security criterion is influenced by invisibility and robustness criteria. Invisibility prevents the eavesdropper from distinguishing the hidden information in the stego text. At the same time, robustness prevents the attacker from tampering the hidden message. The security of the modern text steganography method can be defined as its ability and strength to resist any attack to remove or destroy the hidden data [116,141]. It is achieved by increasing the level of algorithm complexity, such as random or non-sequence embedding positions [106,142], randomly selecting secret bits [143][144][145], and generating a strong stego key [116,121,[144][145][146][147][148]. As a result, using one or more of these complexity techniques makes the hidden data extremely difficult to extract. Besides, the aforementioned, researchers found that the importance of reinforcing text steganography and cryptography methods lies in protecting the secret keys, which are considered the most critical element in information security technologies [35,37,149]. However, only a few researchers have used an encryption to improve the efficiency of their methods [150,151].

Evaluation of Arabic Text Steganography Methods
This paper presents the evaluation of Arabic text steganography methods based on the capacity, security, robustness, and invisibility criteria in Tables 20-26 and summarized below: • Dot method: Although this method enhances invisibility, it is less robust, as the hidden message may be lost if the font format is changed. In addition, the method does not consider the encryption or non-sequence embedding positions to prevent the hidden bits from extraction. Despite using compression, the capacity is still low, with the maximum embedded bits per location are two. • Diacritics method: Low invisibility is the major drawback of this method. The cover text and stego text are not identical, and the stego text has many ambivalences in using diacritics. Besides that, capacity is still low, where the bits per location ranges from one to four. Whereas this method partially enhances the robustness of the stego text, the secret message is not encrypted and not embedded in non-sequence positions. • Kashida method: This method is resistant against the copy-paste action but has downsides in terms of capacity, invisibility, and robustness. • Unicode method: The embedding capacity of this method is decreased even though compression is used in some techniques. However, this method accomplishes high invisibility. The robustness is improved in copy-paste action, font format, and OCR, but encryption and non-sequence embedding are not considered. • Sharp edges method: This method achieved high invisibility, capacity, and robustness. Adding non-sequence embedding adds more complexity to protect it from the attacker.

•
Poetry system method: The method has high invisibility and higher robustness. However, its hiding capacity is limited. • Integrated method: The primary goal of merging methods is to improve performance and overcome the previous methods' limitations. Nevertheless, if the methods are not integrated properly, then these drawbacks are inherited.

Recommendations for Future Works
This study provides recommendations and opens new paths, which are highlighted as follows: • Although some researchers have considered Arabic characters, most of them have not applied their suggested methods to social media. Meanwhile, such media are fertile environments for information hiding, as a large volume of texts is pumped on social networks every day. This volume of texts makes it difficult for the eavesdropper to specifically select any of them that may contain hidden information. Researchers can thus apply some of these methods to social media while facilitating the support for Arabic characters.

•
The integration of text steganography methods improves the capacity and increases the difficulty experienced by the eavesdropper in an attempt to trace the embedding algorithm. However, these methods inherit the disadvantages of the methods that make them up. This is especially obvious in the kashida approach, which increases the stego file size which also raising the level of suspicion in specific cases. Therefore, the integration should be well studied to identify which methods achieve the desired objectives while minimizing the constituent methods' drawbacks. • A compression method reduces the amount of hidden information, thereby increasing the capacity. It also increases the complexity of extracting the secret message from the cover text. Despite this, only a few researchers have used compression to improve the efficiency of their methods. • A few of the proposed studies have provided solutions to enhance the protection of secret messages prior to the embedding process by combining both cryptography and steganography methods, especially for protecting the stego key. This combined method constitutes another layer of protection if the embedding algorithm is detected. • During this survey, it was observed that the use of full diacritics text is lacking, which is the obstacle preventing the exploitation of such diacritics. It is worth noting that students of religious studies or linguistic sciences at all stages adopt this type of text. Similarly, the Quran and Hadiths scripts are omnipresent on the web and social media and widely used as references and an inference.

•
Regardless of kashida's weaknesses, such as increasing the stego file size and thus increasing the suspicion to a reader, utilizing kashida in text watermarking, especially in Quranic scripts, is ideal than text steganography. In text watermarking, kashida is used to protect or copyright the text without affecting the meaning of the text, unlike the diacritics.

•
Most Arabic text steganography methods suffer from low capacity because of the limited number of bits per location and usable characters. Integrated Arabic features with font attributes are used to enhance the capacity.

•
The selection of the embedding positions sequentially tells the attacker the order of the secret bits. Therefore, it is imperative to propose embedding methods with non-sequence position as the additional security layer.

Conclusions
The importance of using the Arabic text as a cover for hiding sensitive information via public channels by governments, companies, and individuals in Arabic-speaking countries cannot be overemphasized. This is because these countries use Arabic text in their daily activities. This paper presents the research landscape of Arabic text steganography methods from its inception to date and discusses seven Arabic text steganography methods: Dot, diacritics, kashida, Unicode, sharp edges, poetry system, and hybrid. We analyzed these methods, categorized them, summarized their methodologies, and determinedtheir strengths and weaknesses. We also evaluated these methods based on the four existing objectives in any steganography method (i.e., capacity, invisibility, robustness, and security).
We found that most of the existing Arabic steganography methods suffer from low capacity because of the low bit per location and less usable characters. In terms of security, several proposed techniques integrated steganography with cryptography to provide prior protection for a confidential message. Converting the selection of embedding position from sequence to non-sequence will add a security layer to the embedding techniques. Although the Arabic language is rich in linguistic characteristics, the previous studies and existing methods have not utilized most of them to achieve high embedding performance. Consequently, this paper opens new paths in Arabic text steganography by providing recommendations for future work.