Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia

: This paper presents a comprehensive test of the principal tasks in document image analysis (DIA), starting with binarization, text line segmentation


Introduction
Since the world entered the digital age in the early 20th century, the need for a document image analysis (DIA) system is increasing.This is due to the dramatic increase in efforts to digitize the various types of document collections available, especially the ancient documents of historical relics found in various parts of the world.Some very interesting projects on a wide variety of heritage document collections can be mentioned here: for example, the tranScriptorium project (http://transcriptorium.eu/) [1]; the READ (Recognition and Enrichment of Archival Documents) project (https://read.transkribus.eu/)[2], which works on documents from the Middle Ages to today, and also focuses on different languages ranging from Ancient Greek to modern English; the

Palm Leaf Manuscripts from Southeast Asia
Regarding the use of writing materials and tools, history records the discovery of important documents written on stone plates, clay plates or tablets, bark, skin, animal bones, ivory, tortoiseshell, papyrus, parchment (form of leather made of processed sheepskin or calfskin) (http://www.casepaper.com/company/paper-history)[14], copper and bronze plates, bamboo, palm leaves, and other materials [15].The choice of natural materials that can be used as a medium for document writing is strongly influenced by the geographical condition and location of a nation.For example, because bamboo and palm trees are easily found in Asia, both types of materials were the first choice of writing material in Asia.In Southeast Asia, most ancient manuscripts were written on palm leaves.For example, in Cambodia, palm leaves have been used as a writing material dating back to the first appearance of Buddhism in the country.In Thailand, dried palm leaves have also been used as one of the most popular written documents for over 500 years [16].Palm leaves were also historically used as writing supports in manuscripts from the Indonesian archipelago.The leaves of sugar, or toddy, palm (Borassus flabellifer) are known as lontar.The existence of ancient palm leaf manuscripts in Southeast Asia is very important both in terms of the quantity and variety of historical contents.

Corpus
Apart from the collection at the museum (Museum Gedong Kertya Singaraja and Museum Bali Denpasar), it is estimated that there are more than 50,000 lontar collections that are owned by private families (Figure 1).For this research, in order to obtain a large variety of manuscript images, sample images have been collected from 23 different collections, which come from five different locations (regions): two museums and three private families.They consist of 10 randomly selected collections from Museum Gedong Kertya, City of Singaraja, Regency of Buleleng, North Bali, Indonesia, four collections from manuscript collections of Museum Bali, City of Denpasar, South Bali, seven collections from a private family collection from the village of Jagaraga, Regency of Buleleng, and two other private family collections from the village of Susut, Regency of Bangli and the village of Rendang, Regency of Karangasem [17].

Balinese Script and Language
Although the official language of Indonesia, Bahasa Indonesia, is written in the Latin script, Indonesia has many local, traditional scripts, most of which are ultimately derived from Brahmi [18].In Bali, palm leaf manuscripts were written in the Balinese script in the Balinese language, in the ancient literary texts composed in the old Javanese language of Kawi and Sanskrit.Balinese language is a Malayo-Polynesian language spoken by more than 3 million people, mainly in Bali, Indonesia (www.omniglot.com/writing/balinese.htm)[19].Balinese is the native language of the people of Bali, known locally as Basa Bali [18].The alphabet and numbers of Balinese script are composed of ±100 character classes including consonants, vowels, and some other special compound characters.According to the Unicode Standard 9.0, the Balinese script actually has the Unicode table from 1B00 to 1B7F.

Balinese Script and Language
Although the official language of Indonesia, Bahasa Indonesia, is written in the Latin script, Indonesia has many local, traditional scripts, most of which are ultimately derived from Brahmi [18].In Bali, palm leaf manuscripts were written in the Balinese script in the Balinese language, in the ancient literary texts composed in the old Javanese language of Kawi and Sanskrit.Balinese language is a Malayo-Polynesian language spoken by more than 3 million people, mainly in Bali, Indonesia (www.omniglot.com/writing/balinese.htm)[19].Balinese is the native language of the people of Bali, known locally as Basa Bali [18].The alphabet and numbers of Balinese script are composed of ±100 character classes including consonants, vowels, and some other special compound characters.According to the Unicode Standard 9.0, the Balinese script actually has the Unicode table from 1B00 to 1B7F.

Corpus
In Cambodia, Khmer palm leaf manuscripts (Figure 2) are still seen in Buddhist establishments and are traditionally used by monks as reading scriptures.Various libraries and institutions have been collecting and digitizing these manuscripts and have even shared the digital images with the public.For instance, the École Française d'Extrême-Orient (EFEO) has launched an online database (http: //khmermanuscripts.efeo.fr)[20] of microfilm images of hundreds of Khmer palm leaf manuscript collections.Some digitized collections are also obtained from the Buddhist Institute, which is one of the biggest institutes in Cambodia responsible for research on Cambodian literature and language related to Buddhism, and also from the National Library (situated in the capital city, Phnom Penh), which is home to a large collection of palm leaf manuscripts.Moreover, a standard digitization campaign was conducted in order to collect palm leaf manuscript images found in Buddhist temples in different locations throughout Cambodia: Phnom Penh, Kandal, and Siem Reap [21].In Cambodia, Khmer palm leaf manuscripts (Figure 2) are still seen in Buddhist establishments and are traditionally used by monks as reading scriptures.Various libraries and institutions have been collecting and digitizing these manuscripts and have even shared the digital images with the public.For instance, the École Française d'Extrême-Orient (EFEO) has launched an online database (http://khmermanuscripts.efeo.fr)[20]of microfilm images of hundreds of Khmer palm leaf manuscript collections.Some digitized collections are also obtained from the Buddhist Institute, which is one of the biggest institutes in Cambodia responsible for research on Cambodian literature and language related to Buddhism, and also from the National Library (situated in the capital city, Phnom Penh), which is home to a large collection of palm leaf manuscripts.Moreover, a standard digitization campaign was conducted in order to collect palm leaf manuscript images found in Buddhist temples in different locations throughout Cambodia: Phnom Penh, Kandal, and Siem Reap [21].

Khmer Script and Language
According to the era during which the documents were created, slightly different versions of Khmer characters are used in the writing of Khmer palm leaf manuscripts.The Khmer alphabet is famous for its numerous symbols (~70), including consonants, different types of vowels, diacritics, and special characters.Certain symbols even have multiple shapes and forms depending on what other symbols are combined with them to create words.The languages written on palm leaf documents vary from Khmer, the official language of Cambodia, to Pali and Sanskrit, by which the modern Khmer language was considerably influenced.Only a minority of Cambodian people, such as philologists and Buddhist monks, are able to read and understand the latter languages.

Corpus
The collection of Sundanese palm leaf manuscripts (Figure 3) comes from Situs Kabuyutan Ciburuy, Garut, West Java, Indonesia.The Kabuyutan Ciburuy is a complex cultural heritage from Prabu Siliwangi and Prabu Kian Santang, the king and the son of the Padjadjaran kingdom.The cultural complex consists of six buildings.One of them is Bale Padaleuman, which is used to store the Sundanese palm leaf manuscripts.The oldest Sundanese palm leaf manuscript in Situs Kabuyutan Ciburuy came from the 15th century.In Bale Padaleuman, there are 27 collections of Sundanese manuscripts.Each collection contains 15 to 30 pages, with dimensions of 25-45 cm in length × 10-15 cm in width [22].

Sundanese Script and Language
The Sundanese palm leaf manuscripts were written in the ancient Sundanese language and script.The characters consist of numbers, vowels (such as a, i, u, e, and o), basic characters (such as ha, na, ca, ra, etc.), punctuation, diacritics (such as panghulu, pangwisad, paneuleung, panyuku, etc.), and many special compound characters.

Khmer Script and Language
According to the era during which the documents were created, slightly different versions of Khmer characters are used in the writing of Khmer palm leaf manuscripts.The Khmer alphabet is famous for its numerous symbols (~70), including consonants, different types of vowels, diacritics, and special characters.Certain symbols even have multiple shapes and forms depending on what other symbols are combined with them to create words.The languages written on palm leaf documents vary from Khmer, the official language of Cambodia, to Pali and Sanskrit, by which the modern Khmer language was considerably influenced.Only a minority of Cambodian people, such as philologists and Buddhist monks, are able to read and understand the latter languages.

Corpus
The collection of Sundanese palm leaf manuscripts (Figure 3) comes from Situs Kabuyutan Ciburuy, Garut, West Java, Indonesia.The Kabuyutan Ciburuy is a complex cultural heritage from Prabu Siliwangi and Prabu Kian Santang, the king and the son of the Padjadjaran kingdom.The cultural complex consists of six buildings.One of them is Bale Padaleuman, which is used to store the Sundanese palm leaf manuscripts.The oldest Sundanese palm leaf manuscript in Situs Kabuyutan Ciburuy came from the 15th century.In Bale Padaleuman, there are 27 collections of Sundanese manuscripts.Each collection contains 15 to 30 pages, with dimensions of 25-45 cm in length × 10-15 cm in width [22].

Sundanese Script and Language
The Sundanese palm leaf manuscripts were written in the ancient Sundanese language and script.The characters consist of numbers, vowels (such as a, i, u, e, and o), basic characters (such as ha, na, ca, ra, etc.), punctuation, diacritics (such as panghulu, pangwisad, paneuleung, panyuku, etc.), and many special compound characters.

Challenges of Document Image Analysis for Palm Leaf Manuscripts
There are two main technical challenges to assessing palm leaf manuscripts in a DIA system.The first challenge is the physical condition of the palm leaf manuscript, which will strongly influence the quality of the document images captured.For the image capturing process for DIA research, data in a paper document are usually captured by optical scanning, but when the document is on a different medium such as microfilm, palm leaves, or fabric, photographic methods are often used to capture the images [13].Nowadays, due to the specific characteristics of the physical support of the manuscripts, the development of DIA methods for palm leaf manuscripts in order to extract relevant information is considered a new research problem in handwritten document analysis.Ancient palm leaf manuscripts contain artifacts due to aging, foxing, yellowing, strain, local shading effects, low intensity variations or poor contrast, random noises, discolored parts, fading, and other types of degradation.
The second challenge is the complexity of the script.The Southeast Asian manuscripts with different scripts and languages provide real challenges for document analysis methods, not only because of the different forms of characters in the script, but also because the writing style of each script (e.g., how to join or separate a character in a text line) differs.It ranges widely from a binarization process [23][24][25], text line segmentation [26,27], and character and text recognition tasks [25,28,29], to the word spotting methods [30].
In the domain of DIA, handwritten character and text recognition has been the subject of intensive research during the last three decades.Some methods have already reached a satisfactory performance, especially for Latin, Chinese, and Japanese scripts.However, the development of handwritten character and text recognition methods for other various Asian scripts presents many issues.In the OCR task and development for palm leaf manuscripts from Southeast Asia, several deformations in the character shapes are visible due to the merges and fractures of the use of nonstandard fonts.The similarities of distinct character shapes, overlaps, and interconnection of the neighboring characters further complicate the OCR system [31].One of the main problems faced when dealing with segmented handwritten character recognition is the ambiguity and illegibility of the characters [32].These characteristics provide suitable conditions to test and evaluate the robustness of feature extraction methods that were proposed for character recognition.

Document Image Analysis Tasks and Investigated Methods
Heritage document preservation is not just about converting physical documents into document images.With many physical documents being digitized and stored in large document databases, and then sent and received via digital machines, the interest and demand grew to require more functionalities than simply viewing and print the images [33].Further treatment is required before the collection of document images can be explored more extensively.For example, a more specific research field needed to be developed to add machine capabilities for extracting information from these images, reading text on a document page, finding sentences, and locating paragraphs, lines, words, and symbols on a diagram [33].
In this work, the methods for each DIA task were investigated for palm leaf manuscripts.The binarization task is evaluated using the latest methods from binarization competitions.The seam carving method is evaluated for the text line segmentation task, compared to a recent text line segmentation method for palm leaf manuscripts [27].For the isolated character/glyph recognition task, the evaluation is reported from the handcrafted feature extraction method, the neural network with unsupervised learning feature to the CNN based method.Finally, the RNN-LSTM based method is used to analyze the word recognition and transliteration task for palm leaf manuscripts.

Challenges of Document Image Analysis for Palm Leaf Manuscripts
There are two main technical challenges to assessing palm leaf manuscripts in a DIA system.The first challenge is the physical condition of the palm leaf manuscript, which will strongly influence the quality of the document images captured.For the image capturing process for DIA research, data in a paper document are usually captured by optical scanning, but when the document is on a different medium such as microfilm, palm leaves, or fabric, photographic methods are often used to capture the images [13].Nowadays, due to the specific characteristics of the physical support of the manuscripts, the development of DIA methods for palm leaf manuscripts in order to extract relevant information is considered a new research problem in handwritten document analysis.Ancient palm leaf manuscripts contain artifacts due to aging, foxing, yellowing, strain, local shading effects, low intensity variations or poor contrast, random noises, discolored parts, fading, and other types of degradation.
The second challenge is the complexity of the script.The Southeast Asian manuscripts with different scripts and languages provide real challenges for document analysis methods, not only because of the different forms of characters in the script, but also because the writing style of each script (e.g., how to join or separate a character in a text line) differs.It ranges widely from a binarization process [23][24][25], text line segmentation [26,27], and character and text recognition tasks [25,28,29], to the word spotting methods [30].
In the domain of DIA, handwritten character and text recognition has been the subject of intensive research during the last three decades.Some methods have already reached a satisfactory performance, especially for Latin, Chinese, and Japanese scripts.However, the development of handwritten character and text recognition methods for other various Asian scripts presents many issues.In the OCR task and development for palm leaf manuscripts from Southeast Asia, several deformations in the character shapes are visible due to the merges and fractures of the use of nonstandard fonts.The similarities of distinct character shapes, overlaps, and interconnection of the neighboring characters further complicate the OCR system [31].One of the main problems faced when dealing with segmented handwritten character recognition is the ambiguity and illegibility of the characters [32].These characteristics provide suitable conditions to test and evaluate the robustness of feature extraction methods that were proposed for character recognition.

Document Image Analysis Tasks and Investigated Methods
Heritage document preservation is not just about converting physical documents into document images.With many physical documents being digitized and stored in large document databases, and then sent and received via digital machines, the interest and demand grew to require more functionalities than simply viewing and print the images [33].Further treatment is required before the collection of document images can be explored more extensively.For example, a more specific research field needed to be developed to add machine capabilities for extracting information from these images, reading text on a document page, finding sentences, and locating paragraphs, lines, words, and symbols on a diagram [33].
In this work, the methods for each DIA task were investigated for palm leaf manuscripts.The binarization task is evaluated using the latest methods from binarization competitions.The seam carving method is evaluated for the text line segmentation task, compared to a recent text line segmentation method for palm leaf manuscripts [27].For the isolated character/glyph recognition task, the evaluation is reported from the handcrafted feature extraction method, the neural network with unsupervised learning feature to the CNN based method.Finally, the RNN-LSTM based method is used to analyze the word recognition and transliteration task for palm leaf manuscripts.

Binarization
Binarization is widely applied as the first pre-processing step in image document analysis [34].Binarization is a common starting point for document image analysis and converts gray image values into binary representation for background and foreground, or, more specifically, text and non-text, which is then fed into further document processing tasks such as text line segmentation and optical character recognition.The performance of binarization techniques has a great impact and directly affects the performance of the recognition task [35].Non-optimal binarization methods produce unrecognizable characters with noise [16].Many binarization methods have been reported.These methods have been tested and evaluated on different types of document collections.Based on the choice of the thresholding value, binarization methods can generally be divided into two types, global binarization and local adaptive binarization [16].Some surveys and comparative studies of the performance of several binarization methods have been reported [35,36].A binarization method that performs well for one document collection may not necessarily be applied to another document collection with the same performance [34].For this reason, there is always a need to perform a comprehensive evaluation of the existing binarization methods for a new document collection that has different characteristics, for example the historical archive documents [36].
In this work, we compared several alternative binarization algorithms for palm leaf manuscripts.We tested and evaluated some well-known standard binarization methods, and some binarization methods that are experimentally promising for historical archive documents, though not specifically for images of palm leaf manuscripts.We also tested the binarization methods from the Document Image Binarization Competition (DIBCO) competition [37,38], for example Howe's method [39] and the ones from the International Conference on Frontiers in Handwriting Recognition (ICFHR) competition (amadi.univ-lr.fr/ICFHR2016_Contest)[25,40].

Global Thresholding
Global thresholding is the simplest technique and the most conventional approach for binarization [34,41].A single threshold value was calculated from the global characteristics of the image.This value should be properly chosen based on a heuristic technique or a statistical measurement to be able to give promising optimal binarization results [36].It is widely known that using a global threshold to process a batch of archive images with different illumination and noise variation is not a proper choice.The variation between images in the foreground and background colors on low-quality document images gives unsatisfactory results.It is difficult to choose one fixed threshold value that is adaptable for all images [36,42].
Otsu's method is a very popular global binarization technique [34,41].Conceptually, Otsu's method tries to find an optimum global threshold on an image by minimizing the weighted sum of variances of the objects and background pixels [34].Otsu's method is implemented as a standard binarization technique in a built-in Matlab function called graythresh (https://fr.mathworks.com/help/images/ref/graythresh.html)[43].

Local Adaptive Binarization
To overcome the weakness of the global binarization technique, many local adaptive binarization techniques were proposed, for example Niblack's method [34,36,41,42,44], Sauvola's method [34,36,41,42,44,45], Wolf's method [42,44,46], NICK method [44], and the Rais method [34].The threshold value in local adaptive binarization technique is calculated in each smaller local image area, region, or window.Niblack's method proposed a local thresholding computation based on the local mean and local standard deviation of a rectangular local window for each pixel on the image.The rectangular sliding local window will cover the neighborhood for each pixel.Using this concept, Niblack's method was reported to outperform many thresholding techniques and gave optimal results for many document collections.However, there is still a drawback to this method.It was found that Niblack's method works optimally only on the text region, but is not well suited for large non-text regions of an image.The absence of text in local areas forces Niblack's method to detect noise as text.The suitable window size should be chosen based on the character and stroke size, which may vary for each image.Many other local adaptive binarization techniques were proposed to improve the performance of the basic Niblack method.For example, Sauvola's method is a modified version of Niblack's method.Sauvola's method proposes a local binarization technique to deal with light texture, large variations, and uneven illumination.The improvement over Niblack's method is in the use of adaptive contribution of standard deviation in determining the local threshold on the gray values of text and non-text pixels.Sauvola's method processes the image in N × N adjacent and non-overlapping blocks separately.
Wolf's method tried to overcome the problem of Sauvola's method when the gray values of text and non-text pixels are close to each other by normalizing the contrast and the mean gray value of the image to compute the local threshold.However, a sharp change in background gray values across the image decreases the performance of Wolf's method.Two other improvements to Niblack's method are NICK method and the Rais method.NICK method proposes a threshold computation derived from the basic Niblack's method and the Rais method proposes an optimal size of window for the local binarization.

Training-Based Binarization
The top two proposed methods in the Binarization Challenge for the ICFHR 2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts are training-based binarization methods [25].The best method in this competition employs a Fully Convolutional Network (FCN).It takes a color subimage as input and outputs the probability that each pixel in the sub-image is part of the foreground.The FCN is pre-trained on normal handwritten document images with automatically generated "ground truth" binarizations (using the method of Wolf et al. [46]).The FCN is then fine-tuned using DIBCO and HDIBCO competition images and their corresponding ground truth binarizations.Finally, the FCN is fine-tuned again on the provided Balinese palm leaf images.Consequently, the pixel probabilities of foreground are efficiently predicted for the whole image at once and thresholded at 0.5 to create a binarized output image.
The second-best method uses two neural network classifiers, C 1 and C 2 , to classify each pixel as background or not.Two binarized images, B 1 and B 2 , are generated in this step.C 1 is a rough classifier that tries to detect all the foreground pixels, while probably making mistakes for some background pixels.C 2 is an accurate classifier that should not classify a background pixel as a foreground pixel but probably misses some foreground pixels.Secondly, these two binary images are joined to get the final classification result.

Text Line Segmentation
Text line segmentation is a crucial pre-processing step in most DIA pipelines.The task aims at extracting and separating text regions into individual lines.Most line segmentation approaches in the literature require that the input image be binarized.However, due to the degradation and noise often found in historical documents such as palm leaf manuscripts, the binarization task is not able to produce good enough results (see Section 5.1).In this paper, we investigate two line segmentation methods that are independent of the binarization task.These approaches work directly on color/grayscale images.

Seam Carving Method
Arvanitopoulos and Süsstrunk [47] proposed a binarization-free method based on a two-stage process: medial seam and separating seam computation.The approach computes medial seams by splitting the input page image into columns whose smoothed projection profiles are then calculated.The positions of the medial seams are obtained based on the local maxima locations of the profiles.The goal of the second stage of the approach is to compute separating seams with the application on the energy map within the area restricted by the medial seams of two neighboring lines found in the previous stage.The technique carves paths that traverse the image from left to right, accumulating energy.The path with the minimum cumulative energy is then chosen.

Adaptive Path Finding Method
This approach was proposed by Valy et al. [27].The method takes as input a grayscale image of a document page.Connected components are extracted from the input image using the stroke width information by applying the stroke width transform (SWT) on the Canny edge map.The set of extracted components (filtered to remove components that come from noise and artifacts) is used to create a stroke map.Using column-wise projection profiles on the output map, estimated number and medial positions of text line can be defined.To adapt better to skew and fluctuation, an unsupervised learning called competitive learning is applied on the set of connected components found previously.Finally, a path finding technique is applied in order to create seam borders between adjacent lines by using a combination of two cost functions: one penalizing the path that goes through the foreground text (intensity difference cost function D) and another one favoring the path that stays close to the estimated medial lines (vertical distance cost function V). Figure 4 illustrates an example of an optimal path.previous stage.The technique carves paths that traverse the image from left to right, accumulating energy.The path with the minimum cumulative energy is then chosen.

Adaptive Path Finding Method
This approach was proposed by Valy et al. [27].The method takes as input a grayscale image of a document page.Connected components are extracted from the input image using the stroke width information by applying the stroke width transform (SWT) on the Canny edge map.The set of extracted components (filtered to remove components that come from noise and artifacts) is used to create a stroke map.Using column-wise projection profiles on the output map, estimated number and medial positions of text line can be defined.To adapt better to skew and fluctuation, an unsupervised learning called competitive learning is applied on the set of connected components found previously.Finally, a path finding technique is applied in order to create seam borders between adjacent lines by using a combination of two cost functions: one penalizing the path that goes through the foreground text (intensity difference cost function ) and another one favoring the path that stays close to the estimated medial lines (vertical distance cost function ).Figure 4 illustrates an example of an optimal path.

Isolated Character/Glyph Recognition
In a DIA system, word or text recognition tasks are generally categorized into two different approaches: segmentation-based and segmentation-free methods.In segmentation-based methods, the isolated character recognition task is a very important process [9].A proper feature extraction and a correct classifier selection can increase the recognition rate [48].Although many methods for isolated character recognition have been developed and tested, especially for Latin-based scripts and alphabets, there is still a need for in-depth evaluation of those methods as applied to various other scripts.This includes the isolated character recognition task for many Southeast Asian scripts, and more specifically scripts that were written on ancient palm leaf manuscripts.
Previous studies on isolated character recognition in palm leaf manuscripts have already been reported, but only with the Balinese script as the benchmark dataset [28,29].In that first work, an experimental study on feature extraction methods for character recognition of Balinese script was performed [28].For the second work, a training-based method with neural network and unsupervised feature learning was used to increase the recognition rate [29].In this paper, we will conduct a broader evaluation of the robustness of the methods previously tested on Balinese script, using the other two palm leaf manuscripts with Khmer and Sundanese scripts.In the next subsections, we provide a brief description of the methods.For a detailed description of each method, interested readers can refer to our previous works.

Isolated Character/Glyph Recognition
In a DIA system, word or text recognition tasks are generally categorized into two different approaches: segmentation-based and segmentation-free methods.In segmentation-based methods, the isolated character recognition task is a very important process [9].A proper feature extraction and a correct classifier selection can increase the recognition rate [48].Although many methods for isolated character recognition have been developed and tested, especially for Latin-based scripts and alphabets, there is still a need for in-depth evaluation of those methods as applied to various other scripts.This includes the isolated character recognition task for many Southeast Asian scripts, and more specifically scripts that were written on ancient palm leaf manuscripts.
Previous studies on isolated character recognition in palm leaf manuscripts have already been reported, but only with the Balinese script as the benchmark dataset [28,29].In that first work, an experimental study on feature extraction methods for character recognition of Balinese script was performed [28].For the second work, a training-based method with neural network and unsupervised feature learning was used to increase the recognition rate [29].In this paper, we will conduct a broader evaluation of the robustness of the methods previously tested on Balinese script, using the other two palm leaf manuscripts with Khmer and Sundanese scripts.In the next sub-sections, we provide a brief description of the methods.For a detailed description of each method, interested readers can refer to our previous works.

Handcrafted Feature Extraction Methods
Since the beginning of pattern recognition research, many feature extraction methods for character recognition have been presented in the literature.In our previous work [28], we investigated and evaluated the performance of 10 feature extraction methods with two classifiers, k-NN (k-Nearest Neighbor) and SVM (Support Vector Machine), in 29 different schemes for Balinese script on palm leaf manuscripts.After evaluating the performance of those individual feature extraction methods, we found that the Histogram of Gradient (HoG) features as directional gradient-based features [9,49] (Figure 5), the Neighborhood Pixels Weights (NPW) [50] (Figure 6), the Kirsch Directional Edges [50], and Zoning [12,32,50,51] (Figure 7) give very promising results.We then proposed a new feature extraction method applying NPW on Kirsch edge images (Figure 8) and concatenated the NPW-Kirsch with two other features, HoG and Zoning method, with k-NN as the classifier.
and evaluated the performance of 10 feature extraction methods with two classifiers, k-NN (k-Nearest Neighbor) and SVM (Support Vector Machine), in 29 different schemes for Balinese script on palm leaf manuscripts.After evaluating the performance of those individual feature extraction methods, we found that the Histogram of Gradient (HoG) features as directional gradient-based features [9,49] (Figure 5), the Neighborhood Pixels Weights (NPW) [50] (Figure 6), the Kirsch Directional Edges [50], and Zoning [12,32,50,51] (Figure 7) give very promising results.We then proposed a new feature extraction method applying NPW on Kirsch edge images (Figure 8) and concatenated the NPW-Kirsch with two other features, HoG and Zoning method, with k-NN as the classifier.(Figure 5), the Neighborhood Pixels Weights (NPW) [50] (Figure 6), the Kirsch Directional Edges [50], and Zoning [12,32,50,51] (Figure 7) give very promising results.We then proposed a new feature extraction method applying NPW on Kirsch edge images (Figure 8) and concatenated the NPW-Kirsch with two other features, HoG and Zoning method, with k-NN as the classifier.(Figure 5), the Neighborhood Pixels Weights (NPW) [50] (Figure 6), the Kirsch Directional Edges [50], and Zoning [12,32,50,51] (Figure 7) give very promising results.We then proposed a new feature extraction method applying NPW on Kirsch edge images (Figure 8) and concatenated the NPW-Kirsch with two other features, HoG and Zoning method, with k-NN as the classifier.

Unsupervised Learning Feature and Neural Network
With the aim of improving the performance of our proposed feature extraction method, we continued our research on isolated character recognition by implementing the neural network as classifier.In this second step [29], the same combination of feature extraction methods was used and sent as the input feature vector to a single-layer neural network character recognizer.In addition to using only the neural network, we also applied an additional sub-module for the initial unsupervised learning based on K-Means clustering (Figure 9).This schema was inspired by the study of Coates et al. [52,53].The unsupervised learning calculates the initial learning weight for the neural network training phase from the cluster centers of all feature vectors.With the aim of improving the performance of our proposed feature extraction method, we continued our research on isolated character recognition by implementing the neural network as classifier.In this second step [29], the same combination of feature extraction methods was used and sent as the input feature vector to a single-layer neural network character recognizer.In addition to using only the neural network, we also applied an additional sub-module for the initial unsupervised learning based on K-Means clustering (Figure 9).This schema was inspired by the study of Coates et al. [52,53].The unsupervised learning calculates the initial learning weight for the neural network training phase from the cluster centers of all feature vectors.

Convolutional Neural Network
The multilayer convolutional neural networks (CNN) have proven very effective in areas such as image recognition and classification.In this evaluation experiment, a vanilla CNN is used.The architecture of the CNN (Figure 10) is described as follows (this architecture has also been reported in Khmer isolated character recognition baseline in [21]).The grayscale input images of isolated characters are rescaled to 48 × 48 pixels in size and normalized by applying histogram stretching.The network consists of three sets of convolution and max pooling pairs.All convolutional layers use a stride of one and are zero padded so that the output is the same size as the input.The output of each convolutional layer is activated using the ReLu function and followed by a max pooling of 2 × 2 blocks.The numbers of feature maps (of size 5 × 5) used in the three consecutive convolutional layers are 8, 16, and 32, respectively.The output of the last layers is flattened, and a fully-connected layer

Convolutional Neural Network
The multilayer convolutional neural networks (CNN) have proven very effective in areas such as image recognition and classification.In this evaluation experiment, a vanilla CNN is used.The architecture of the CNN (Figure 10) is described as follows (this architecture has also been reported in Khmer isolated character recognition baseline in [21]).The grayscale input images of isolated characters are rescaled to 48 × 48 pixels in size and normalized by applying histogram stretching.
The network consists of three sets of convolution and max pooling pairs.All convolutional layers use a stride of one and are zero padded so that the output is the same size as the input.The output of each convolutional layer is activated using the ReLu function and followed by a max pooling of 2 × 2 blocks.The numbers of feature maps (of size 5 × 5) used in the three consecutive convolutional layers are 8, 16, and 32, respectively.The output of the last layers is flattened, and a fully-connected layer with 1024 neurons (also activated with ReLu) is added, followed by the last output layer (softmax activation) consisting of N class neurons, where N class is the number of character classes.Dropout with probability p = 0.5 is applied before the output layer to prevent overfitting.We trained the network using an Adam optimizer with a batch size of 100 and a learning rate of 0.0001.

Convolutional Neural Network
The multilayer convolutional neural networks (CNN) have proven very effective in areas such as image recognition and classification.In this evaluation experiment, a vanilla CNN is used.The architecture of the CNN (Figure 10) is described as follows (this architecture has also been reported in Khmer isolated character recognition baseline in [21]).The grayscale input images of isolated characters are rescaled to 48 × 48 pixels in size and normalized by applying histogram stretching.The network consists of three sets of convolution and max pooling pairs.All convolutional layers use a stride of one and are zero padded so that the output is the same size as the input.The output of each convolutional layer is activated using the ReLu function and followed by a max pooling of 2 × 2 blocks.The numbers of feature maps (of size 5 × 5) used in the three consecutive convolutional layers are 8, 16, and 32, respectively.The output of the last layers is flattened, and a fully-connected layer with 1024 neurons (also activated with ReLu) is added, followed by the last output layer (softmax activation) consisting of   neurons, where   is the number of character classes.Dropout with probability p = 0.5 is applied before the output layer to prevent overfitting.We trained the network using an Adam optimizer with a batch size of 100 and a learning rate of 0.0001.

Word Recognition and Transliteration
In order to make the palm leaf manuscripts more accessible, readable, and understandable to a wider audience, an optical character recognition (OCR) system should be developed.In many DIA systems, word or text recognition is the final task in the processing pipeline.However, normally in Southeast Asian script the speech sound of the syllable change is related to some certain phonological rules.In this case, an OCR system is not enough.Therefore, a transliteration system should also be developed to help transliterate the ancient scripts on these manuscripts.By definition, transliteration is defined as the process of obtaining the phonetic translation of names across languages [54].Transliteration involves rendering a language from one writing system to another.In [54], the problem is stated formally as a sequence labeling problem from one language alphabet to another.It will help us to index and to quickly and efficiently access the content of the manuscripts.In our previous work [29], a complete scheme for segmentation-based glyph recognition and transliteration specific to Balinese palm leaf manuscripts was proposed.In this work, a segmentation-free method will be evaluated to recognize and transliterate the words from three different scripts of a palm leaf manuscript.

RNN/LSTM-Based Methods
From the last decade, sequence-analysis-based methods using a Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM) type of learning network have been very popular among researchers in text recognition.RNN-LSTM-based method together with a Connectionist Temporal Classification (CTC) works as a segmentation-free learning-based method to recognize the sequence of characters in a word or text without any handcrafted feature extraction method.The raw image pixel can be sent directly as the input to the learning network and there is no requirement to segment the training data sequence.RNN is basically an extended version of the basic feedforward neural network.In a RNN, the neurons in the hidden layer are connected to each other.RNN offers very good context-aware processing to recognize patterns in a sequence or time series.One drawback of RNN is the vanishing gradient problem.To deal with this problem, the LSTM architecture was introduced.The LSTM network adds multiplicative gates and additive feedback.Bidirectional LSTM is an LSTM architecture with two-directional (forward and backward) context processing.LSTM architecture is widely evaluated as a generic and language-independent text recognizer [55].In this work, the OCRopy (https://github.com/tmbdev/ocropy)[56] framework is used to test and evaluate the word recognition and transliteration tasks for the palm leaf manuscript collection.OCRopy provides the functional library of the OCR system by using RNN-LSTM architecture (http://graal.hypotheses.org/786) [57,58].We evaluated the dataset with unidirectional LSTM and the (Bidirectional LTSM) BLSTM architecture.

Experiments: Datasets and Evaluation Methods
From the three manuscript corpuses (Khmer, Balinese, and Sundanese), the datasets for each DIA task were extracted and used in the experimental work for this research.

Datasets
The palm leaf manuscript datasets for binarization task are presented in Table 1.For Khmer manuscripts, one ground truth binarized image is provided for each image, but for Balinese and Sundanese manuscripts, each image has two different ground truth binarized images [17,25].The study of ground truth variability and subjectivity was reported in the previous work [24].In this research, we only used the first binarized ground truth image for evaluation.The binarized ground truth images for Khmer manuscripts were generated manually with the help of photo editing software (Figure 11).A pressure-sensitive tip stylus is used to trace each text stroke by keeping the original size of the stroke width [59].For the manuscripts from Bali, the binarized ground truth images have been created with a semi-automatic scheme [17,[23][24][25] (Figure 12).The binarized ground truth images for Sundanese manuscripts were manually [22] generated using PixLabeler [60] (Figure 13).The training set is provided only for the Balinese dataset.We used all images of the Khmer and Sundanese corpuses as a test set because the training-based binarization method (ICFHR G1 method, see Section 5.1) was evaluated for the Khmer and Sundanese datasets by using only the pre-trained Balinese training set weighted model.images have been created with a semi-automatic scheme [17,[23][24][25] (Figure 12).The binarized ground truth images for Sundanese manuscripts were manually [22] generated using PixLabeler [60] (Figure 13).The training set is provided only for the Balinese dataset.We used all images of the Khmer and Sundanese corpuses as a test set because the training-based binarization method (ICFHR G1 method, see Section 5.1) was evaluated for the Khmer and Sundanese datasets by using only the pre-trained Balinese training set weighted model.

Evaluation Method
Following our previous work [24] and the evaluation method from the ICFHR competition [25], three metrics of binarization evaluation that were used in the DIBCO 2009 contest [37] are used in the binarization task evaluation for this work.Those three metrics are F-Measure (FM) (Equation ( 3)), Peak SNR (PSNR) (Equation ( 5)), and Negative Rate Metric (NRM) (Equation ( 8)).
F-Measure (FM): FM is defined from Recall and Precision.
Re 100 Pr 100 TP, defined as true positive, occurs when the image pixel is labeled as foreground and the ground truth is also.FP, defined as false positive, occurs when the image pixel is labeled as foreground but the ground truth is labeled as background.FN, defined as false negative, occurs when the image pixel is labeled as background but the ground truth is labeled as foreground (Equations (1) and ( 2)).

Evaluation Method
Following our previous work [24] and the evaluation method from the ICFHR competition [25], three metrics of binarization evaluation that were used in the DIBCO 2009 contest [37] are used in the binarization task evaluation for this work.Those three metrics are F-Measure (FM) (Equation ( 3)), Peak SNR (PSNR) (Equation ( 5)), and Negative Rate Metric (NRM) (Equation ( 8)).
F-Measure (FM): FM is defined from Recall and Precision.
Re 100 Pr 100 TP, defined as true positive, occurs when the image pixel is labeled as foreground and the ground truth is also.FP, defined as false positive, occurs when the image pixel is labeled as foreground but the ground truth is labeled as background.FN, defined as false negative, occurs when the image pixel is labeled as background but the ground truth is labeled as foreground (Equations (1) and ( 2)).

Evaluation Method
Following our previous work [24] and the evaluation method from the ICFHR competition [25], three metrics of binarization evaluation that were used in the DIBCO 2009 contest [37] are used in the binarization task evaluation for this work.Those three metrics are F-Measure (FM) (Equation ( 3)), Peak SNR (PSNR) (Equation ( 5)), and Negative Rate Metric (NRM) (Equation ( 8)).
F-Measure (FM): FM is defined from Recall and Precision.
Recall = TP FN + TP × 100 (1) TP, defined as true positive, occurs when the image pixel is labeled as foreground and the ground truth is also.FP, defined as false positive, occurs when the image pixel is labeled as foreground but the ground truth is labeled as background.FN, defined as false negative, occurs when the image pixel is labeled as background but the ground truth is labeled as foreground (Equations (1) and ( 2)).
A higher F-measure indicates a better match.
where C is defined as 1, the difference between foreground and background colors in the case of a binary image.A higher PSNR indicates a better match.
of connected components that represented a correct character in Balinese script from the word-level binarized images that were manually annotated [11,17,20] using Aletheia (http://www.primaresearch.org/tools/Aletheia) [62,63] (Figure 14).The Sundanese character dataset was annotated manually [22] (Figure 15).For the Khmer character dataset, a tool has been developed to annotate characters/glyphs on the document page.The polygon boundary of each character is traced manually by dotting out its vertex one by one.A label is given to each annotated character after its boundary has been constructed [21] (Figure 16).The palm leaf manuscript datasets for isolated character/glyph recognition task are presented in Table 3.For the Balinese character dataset, Balinese philologists manually annotated the segment of connected components that represented a correct character in Balinese script from the word-level binarized images that were manually annotated [11,17,20] using Aletheia (http://www.primaresearch.org/tools/Aletheia)[62,63] (Figure 14).The Sundanese character dataset was annotated manually [22] (Figure 15).For the Khmer character dataset, a tool has been developed to annotate characters/glyphs on the document page.The polygon boundary of each character is traced manually by dotting out its vertex one by one.A label is given to each annotated character after its boundary has been constructed [21] (Figure 16).

Evaluation Method
Following the evaluation method from the ICFHR competition [25], the recognition rate, i.e., the percentage of correctly classified samples over the test samples (C/N) is calculated, where C is the number of correctly recognized samples and N is the total number of test samples.

Datasets
The palm leaf manuscript datasets for word recognition and transliteration task are presented in Table 4.For the Khmer dataset, all characters on the page have been annotated and grouped together into words (Figure 17).More than one label may be given to the created word.The order of

Evaluation Method
Following the evaluation method from the ICFHR competition [25], the recognition rate, i.e., the percentage of correctly classified samples over the test samples (C/N) is calculated, where C is the number of correctly recognized samples and N is the total number of test samples.

Datasets
The palm leaf manuscript datasets for word recognition and transliteration task are presented in Table 4.For the Khmer dataset, all characters on the page have been annotated and grouped together into words (Figure 17).More than one label may be given to the created word.The order of

Evaluation Method
Following the evaluation method from the ICFHR competition [25], the recognition rate, i.e., the percentage of correctly classified samples over the test samples (C/N) is calculated, where C is the number of correctly recognized samples and N is the total number of test samples.

Datasets
The palm leaf manuscript datasets for word recognition and transliteration task are presented in Table 4.For the Khmer dataset, all characters on the page have been annotated and grouped together into words (Figure 17).More than one label may be given to the created word.The order of how each character in the word is selected is also kept [21].Balinese (Figure 18) and the Sundanese (Figure 19) word dataset was manually annotated using Aletheia [63].how each character in the word is selected is also kept [21].Balinese (Figure 18) and the Sundanese (Figure 19) word dataset was manually annotated using Aletheia [63].how each character in the word is selected is also kept [21].Balinese (Figure 18) and the Sundanese (Figure 19) word dataset was manually annotated using Aletheia [63].how each character in the word is selected is also kept [21].Balinese (Figure 18) and the Sundanese (Figure 19) word dataset was manually annotated using Aletheia [63].

Experimental Results and Discussion
In this section, the performance of each method for the DIA tasks on palm leaf manuscript collections is presented.

Binarization
The experimental results for the binarization task are presented in Table 5.These results show that the performance of all methods on each dataset is still quite low.Most of the methods achieve less than a 50% FM score.This means that palm leaf manuscripts are still an open challenge for the binarization task.The different parameter values for the local adaptive binarization methods show significant improvement in performance, but still give unsatisfactory results.In these experiments, the ICFHR G1 method was evaluated for the Khmer and Sundanese datasets using the pre-trained Balinese training set weighted model.Based on these experiments, Niblack's method gives the highest FM score for Sundanese manuscripts (Figure 20), ICFHR G1 method gives the highest FM score for Khmer manuscripts (Figure 21), and ICFHR G2 gives the highest FM score for Balinese manuscripts (Figure 22).However, visually, there are still many broken and unrecognizable characters/glyphs, and noise is detected in the images.

Text Line Segmentation
The experimental results for text line segmentation task are presented in Table 6.According to these results, both methods perform sufficiently well for most datasets, except Khmer 1 (Figures 23-25).This is because all images in this set are of low quality due to the fact that they are digitized from microfilms.Nevertheless, the adaptive path finding method achieves better results than the seam carving method on all datasets of palm leaf manuscripts in our experiment.The main difference between these two approaches is that instead of finding an optimal separating path within an area constrained by medial seam locations of two adjacent lines (in the seam carving method), the adaptive path finding approach tries to find a path close to an estimated straight seam line section.These line sections already represent well the seam borders between two neighboring lines, so they can be considered a better guide for finding good paths, hence producing better results.
One common error that we encounter for both methods is in the medial position computation stage.Detecting correct medial positions of text lines is crucial for the path-finding stage of the methods.In our experiment, we noticed that some parameters play an important role.For instance, the number of columns/slices  of the seam carving method and the high and low thresholding values of the edge detection algorithm in the adaptive path finding approach are important.In order to select these parameters, a validation set consisting of five random pages is used.The optimal values of the parameters are then empirically selected based on the results from this validation set.Table 6.Experimental results for text line segmentation task: the count of ground truth elements (N), and the count of result elements (M), the one-to-one (o2o) match score is computed for a region pair based on 90% acceptance threshold, detection rate (DR), recognition accuracy (RA), and performance metric (FM).

Text Line Segmentation
The experimental results for text line segmentation task are presented in Table 6.According to these results, both methods perform sufficiently well for most datasets, except Khmer 1 (Figures 23-25).This is because all images in this set are of low quality due to the fact that they are digitized from microfilms.Nevertheless, the adaptive path finding method achieves better results than the seam carving method on all datasets of palm leaf manuscripts in our experiment.The main difference between these two approaches is that instead of finding an optimal separating path within an area constrained by medial seam locations of two adjacent lines (in the seam carving method), the adaptive path finding approach tries to find a path close to an estimated straight seam line section.These line sections already represent well the seam borders between two neighboring lines, so they can be considered a better guide for finding good paths, hence producing better results.
One common error that we encounter for both methods is in the medial position computation stage.Detecting correct medial positions of text lines is crucial for the path-finding stage of the methods.In our experiment, we noticed that some parameters play an important role.For instance, the number of columns/slices r of the seam carving method and the high and low thresholding values of the edge detection algorithm in the adaptive path finding approach are important.In order to select these parameters, a validation set consisting of five random pages is used.The optimal values of the parameters are then empirically selected based on the results from this validation set.Table 6.Experimental results for text line segmentation task: the count of ground truth elements (N), and the count of result elements (M), the one-to-one (o2o) match score is computed for a region pair based on 90% acceptance threshold, detection rate (DR), recognition accuracy (RA), and performance metric (FM).

Isolated Character/Glyph Recognition
The experimental results for isolated character/glyph recognition task are presented in Table 7.For handcrafted feature with k-NN, the Khmer set with 113,206 train images and 90,669 test images will need a considerable amount of time for one-to-one k-NN comparison, so we do not think it is reasonable to use it.For CNN 1, previous work only reported results for the Balinese set.For all ICFHR competition methods, the competition was proposed only for the Balinese set, so we only have the reported results for the Balinese set.According to these results, the handcrafted feature extraction combination of HoG-NPW-Kirsch-Zoning is a proper choice resulting in a good recognition rate for Balinese and Khmer characters/glyphs.The CNN methods also show satisfactory results, but the differences in recognition rates are not too significant with the handcrafted feature combinations.The unbalanced number of image samples for each character class means the CNN method did not perform optimally.For the Sundanese dataset, the handcrafted feature with NN slightly outperformed the CNN method.The UFL method slightly increased the recognition rate of the pure NN method for the Khmer and Balinese datasets.[25] 87.44 --ICFHR G1: VMQDF [25] 88.39 --ICFHR G3 [25] 77.83 --ICFHR G5 [25] 77.70 --

Isolated Character/Glyph Recognition
The experimental results for isolated character/glyph recognition task are presented in Table 7.For handcrafted feature with k-NN, the Khmer set with 113,206 train images and 90,669 test images will need a considerable amount of time for one-to-one k-NN comparison, so we do not think it is reasonable to use it.For CNN 1, previous work only reported results for the Balinese set.For all ICFHR competition methods, the competition was proposed only for the Balinese set, so we only have the reported results for the Balinese set.According to these results, the handcrafted feature extraction combination of HoG-NPW-Kirsch-Zoning is a proper choice resulting in a good recognition rate for Balinese and Khmer characters/glyphs.The CNN methods also show satisfactory results, but the differences in recognition rates are not too significant with the handcrafted feature combinations.The unbalanced number of image samples for each character class means the CNN method did not perform optimally.For the Sundanese dataset, the handcrafted feature with NN slightly outperformed the CNN method.The UFL method slightly increased the recognition rate of the pure NN method for the Khmer and Balinese datasets.[25] 87.44 --ICFHR G1: VMQDF [25] 88.39 --ICFHR G3 [25] 77.83 --ICFHR G5 [25] 77.70 --

Isolated Character/Glyph Recognition
The experimental results for isolated character/glyph recognition task are presented in Table 7.For handcrafted feature with k-NN, the Khmer set with 113,206 train images and 90,669 test images will need a considerable amount of time for one-to-one k-NN comparison, so we do not think it is reasonable to use it.For CNN 1, previous work only reported results for the Balinese set.For all ICFHR competition methods, the competition was proposed only for the Balinese set, so we only have the reported results for the Balinese set.According to these results, the handcrafted feature extraction combination of HoG-NPW-Kirsch-Zoning is a proper choice resulting in a good recognition rate for Balinese and Khmer characters/glyphs.The CNN methods also show satisfactory results, but the differences in recognition rates are not too significant with the handcrafted feature combinations.The unbalanced number of image samples for each character class means the CNN method did not perform optimally.For the Sundanese dataset, the handcrafted feature with NN slightly outperformed the CNN method.The UFL method slightly increased the recognition rate of the pure NN method for the Khmer and Balinese datasets.[25] 87.44 --ICFHR G1: VMQDF [25] 88.39 --ICFHR G3 [25] 77.83 --ICFHR G5 [25] 77.70 --

Isolated Character/Glyph Recognition
The experimental results for isolated character/glyph recognition task are presented in Table 7.For handcrafted feature with k-NN, the Khmer set with 113,206 train images and 90,669 test images will need a considerable amount of time for one-to-one k-NN comparison, so we do not think it is reasonable to use it.For CNN 1, previous work only reported results for the Balinese set.For all ICFHR competition methods, the competition was proposed only for the Balinese set, so we only have the reported results for the Balinese set.According to these results, the handcrafted feature extraction combination of HoG-NPW-Kirsch-Zoning is a proper choice resulting in a good recognition rate for Balinese and Khmer characters/glyphs.The CNN methods also show satisfactory results, but the differences in recognition rates are not too significant with the handcrafted feature combinations.The unbalanced number of image samples for each character class means the CNN method did not perform optimally.For the Sundanese dataset, the handcrafted feature with NN slightly outperformed the CNN method.The UFL method slightly increased the recognition rate of the pure NN method for the Khmer and Balinese datasets.[25] 87.44 --ICFHR G1: VMQDF [25] 88.39 --ICFHR G3 [25] 77.83 --ICFHR G5 [25] 77.70 --

Word Recognition and Transliteration
The experimental results for word recognition and transliteration task are presented in Table 8.The error rates for word recognition and transliteration tests set on each training model iteration are shown in Figures 26-28.The LSTM-based architecture of OCRopy seems very promising in terms of recognizing and directly transliterating Balinese words.For the Khmer and Sundanese datasets, the LSTM architecture seems to struggle to learn the training data.More synthetic data training with a more frequent word should be generated in order to support the training process.For the Balinese dataset, a sequence depth of 100 pixels with a neuron size of 200 gives a better result for both LSTM and BLTSM architecture.Most of the Southeast Asian scripts are syllabic scripts.One character/glyph in these scripts represents a syllable, with a sequence of letters in Latin script.In this case, word transliteration is not just word recognition with one-to-one glyph-to-letter association.This makes word transliteration more challenging than character/glyph recognition.

Word Recognition and Transliteration
The experimental results for word recognition and transliteration task are presented in Table 8.The error rates for word recognition and transliteration tests set on each training model iteration are shown in Figures 26-28.The LSTM-based architecture of OCRopy seems very promising in terms of recognizing and directly transliterating Balinese words.For the Khmer and Sundanese datasets, the LSTM architecture seems to struggle to learn the training data.More synthetic data training with a more frequent word should be generated in order to support the training process.For the Balinese dataset, a sequence depth of 100 pixels with a neuron size of 200 gives a better result for both LSTM and BLTSM architecture.Most of the Southeast Asian scripts are syllabic scripts.One character/glyph in these scripts represents a syllable, with a sequence of letters in Latin script.In this case, word transliteration is not just word recognition with one-to-one glyph-to-letter association.This makes word transliteration more challenging than character/glyph recognition.

Word Recognition and Transliteration
The experimental results for word recognition and transliteration task are presented in Table 8.The error rates for word recognition and transliteration tests set on each training model iteration are shown in Figures 26-28.The LSTM-based architecture of OCRopy seems very promising in terms of recognizing and directly transliterating Balinese words.For the Khmer and Sundanese datasets, the LSTM architecture seems to struggle to learn the training data.More synthetic data training with a more frequent word should be generated in order to support the training process.For the Balinese dataset, a sequence depth of 100 pixels with a neuron size of 200 gives a better result for both LSTM and BLTSM architecture.Most of the Southeast Asian scripts are syllabic scripts.One character/glyph in these scripts represents a syllable, with a sequence of letters in Latin script.In this case, word transliteration is not just word recognition with one-to-one glyph-to-letter association.This makes word transliteration more challenging than character/glyph recognition.

Conclusions and Future Work
A comprehensive experimental test of the principal tasks in a DIA system, starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new collection of palm leaf manuscripts from Southeast Asia, is presented.The results from all experiments provide the latest findings and a quantitative benchmark of palm leaf manuscripts analysis for researchers in the DIA community.Binarizing the palm leaf manuscript images seems very challenging.Still, with many broken and unrecognizable characters/glyphs and noises detected in the images, binarization should be reconsidered the first step in the DIA process for palm leaf manuscripts.On the other hand, although there are already training-based DIA methods that do not require this binarization process, they usually require adequate training data.The problem of inadequate training data also influences glyph recognition and word transliteration.The unbalanced number of image samples for each character class means the CNN methods did not perform optimally in glyph recognition.The differences in the recognition rates of the CNN methods are not too significant with the handcrafted feature combinations.For future work, more synthetic data training for palm leaf manuscript images should be generated in order to support the training process.Especially for the word transliteration task, more synthetic data training with a more frequent word should be generated in order to improve the training process.Many examples of glyph-to-syllable association should be synthetically generated to transliterate syllabic scripts from Southeast Asia.The special characteristics and challenges posed by the palm leaf manuscript collections will require a thorough adaptation of the DIA system.Some specific adjustments need to be applied to the DIA methods for other types of documents.The adaptation of a DIA for palm leaf manuscripts is not unique and is not universal for all types of problem from different collections.However, among the DIA system's non-unique solutions, one specific solution can still be designed to deliver the most optimal DIA system performance while still taking into account the conditions of that collection.Ahmad Darsa, the philologists from Sundanese Centre Studies of Universitas Padjadjaran, the Situs Kabuyutan Ciburuy Garut, all families in Bali, Indonesia, the EFEO team, the Buddhist Institute, and the National Library in Cambodia for providing us with samples of palm leaf manuscripts.We also thank the students from the

Conclusions and Future Work
A comprehensive experimental test of the principal tasks in a DIA system, starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new collection of palm leaf manuscripts from Southeast Asia, is presented.The results from all experiments provide the latest findings and a quantitative benchmark of palm leaf manuscripts analysis for researchers in the DIA community.Binarizing the palm leaf manuscript images seems very challenging.Still, with many broken and unrecognizable characters/glyphs and noises detected in the images, binarization should be reconsidered the first step in the DIA process for palm leaf manuscripts.On the other hand, although there are already training-based DIA methods that do not require this binarization process, they usually require adequate training data.The problem of inadequate training data also influences glyph recognition and word transliteration.The unbalanced number of image samples for each character class means the CNN methods did not perform optimally in glyph recognition.The differences in the recognition rates of the CNN methods are not too significant with the handcrafted feature combinations.For future work, more synthetic data training for palm leaf manuscript images should be generated in order to support the training process.Especially for the word transliteration task, more synthetic data training with a more frequent word should be generated in order to improve the training process.Many examples of glyph-to-syllable association should be synthetically generated to transliterate syllabic scripts from Southeast Asia.The special characteristics and challenges posed by the palm leaf manuscript collections will require a thorough adaptation of the DIA system.Some specific adjustments need to be applied to the DIA methods for other types of documents.The adaptation of a DIA for palm leaf manuscripts is not unique and is not universal for all types of problem from different collections.However, among the DIA system's non-unique solutions, one specific solution can still be designed to deliver the most optimal DIA system performance while still taking into account the conditions of that collection.

Figure 4 .
Figure 4.An example of an optimal path going from start state  1 to goal state   .

Figure 4 .
Figure 4.An example of an optimal path going from start state S 1 to goal state S n .

Figure 5 .
Figure 5.The representation of the array of cells in HoG [28].

Figure 5 .
Figure 5.The representation of the array of cells in HoG [28].

Figure 5 .
Figure 5.The representation of the array of cells in HoG [28].

Figure 5 .
Figure 5.The representation of the array of cells in HoG [28].

Figure 9 .
Figure 9. Schema of character recognizer with feature extraction method, unsupervised learning feature, and neural network [29].

Figure 9 .
Figure 9. Schema of character recognizer with feature extraction method, unsupervised learning feature, and neural network [29].

Figure 10 .
Figure 10.Architecture of the CNN.Figure 10.Architecture of the CNN.

Figure 10 .
Figure 10.Architecture of the CNN.Figure 10.Architecture of the CNN.

Figure 23 .
Figure 23.Text line segmentation of Balinese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 24 .
Figure 24.Text line segmentation of Khmer manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 25 .
Figure 25.Text line segmentation of Sundanese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 23 .
Figure 23.Text line segmentation of Balinese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 23 .
Figure 23.Text line segmentation of Balinese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 24 .
Figure 24.Text line segmentation of Khmer manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 25 .
Figure 25.Text line segmentation of Sundanese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 24 .
Figure 24.Text line segmentation of Khmer manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

J. Imaging 2017, 4 , 26 Figure 23 .
Figure 23.Text line segmentation of Balinese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 24 .
Figure 24.Text line segmentation of Khmer manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 25 .
Figure 25.Text line segmentation of Sundanese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 25 .
Figure 25.Text line segmentation of Sundanese manuscript with the Seam Carving method (green) and Adaptive Path Finding (red).

Figure 27 .
Figure 27.Error rate for Khmer word recognition and transliteration test set.

Figure 26 .
Figure 26.Error rate for Balinese word recognition and transliteration test set.

Figure 26 .
Figure 26.Error rate for Balinese word recognition and transliteration test set.

Figure 27 .
Figure 27.Error rate for Khmer word recognition and transliteration test set.Figure 27.Error rate for Khmer word recognition and transliteration test set.

Figure 27 .
Figure 27.Error rate for Khmer word recognition and transliteration test set.Figure 27.Error rate for Khmer word recognition and transliteration test set.

Figure 28 .
Figure 28.Error rate for Sundanese word recognition and transliteration test set.
Department of Informatics Education and the Department of Balinese Literature, University of Pendidikan Ganesha, the Institute of Technology of Cambodia, and the National Institute of Post, Telecommunication and ICT for helping us with the ground truthing process for this research project.This work is supported by the DIKTI BPPLN Indonesian Scholarship Program, the STIC Asia Program implemented by the French Ministry of Foreign Affairs and International Development (MAEDI), and ARES-CCD (program AI 2014-2019) under the funding of Belgian university cooperation, and DRPMI Universitas Padjadjaran, DIKTI International Collaboration and Publication grant 2017.

Figure 28 .
Figure 28.Error rate for Sundanese word recognition and transliteration test set.

Table 1 .
Palm leaf manuscript datasets for binarization task.

Table 1 .
Palm leaf manuscript datasets for binarization task.

Table 3 .
Palm leaf manuscript datasets for isolated character/glyph recognition task.

Table 3 .
Palm leaf manuscript datasets for isolated character/glyph recognition task.

Table 4 .
Palm leaf manuscript datasets for word recognition and transliteration tasks.

Table 4 .
Palm leaf manuscript datasets for word recognition and transliteration tasks.

Table 4 .
Palm leaf manuscript datasets for word recognition and transliteration tasks.

Table 4 .
Palm leaf manuscript datasets for word recognition and transliteration tasks.

Table 5 .
Experimental results for binarization task in F-Measure (FM), Peak SNR (PSNR), and Negative Rate Metric (NRM).A higher F-measure and PSNR, and a lower NRM, indicate a better result.

Table 7 .
Experimental results for isolated character/glyph recognition tasks (in % recognition rate).

Table 7 .
Experimental results for isolated character/glyph recognition tasks (in % recognition rate).

Table 7 .
Experimental results for isolated character/glyph recognition tasks (in % recognition rate).

Table 7 .
Experimental results for isolated character/glyph recognition tasks (in % recognition rate).

Table 8 .
Experimental results for word recognition and transliteration tasks (in % error rate for test).

Table 8 .
Experimental results for word recognition and transliteration tasks (in % error rate for test).
Figure 26.Error rate for Balinese word recognition and transliteration test set.

Table 8 .
Experimental results for word recognition and transliteration tasks (in % error rate for test).