A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts

Yuadi, Imam; Nisa’, Khoirun; Nazikhah, Nisak Ummi; Halim, Yunus Abdul; Asyhari, A. Taufiq; Hu, Chih-Chien

doi:10.3390/heritage8080337

Open AccessArticle

A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts

by

Imam Yuadi

^1,*

,

Khoirun Nisa’

¹,

Nisak Ummi Nazikhah

²,

Yunus Abdul Halim

¹,

A. Taufiq Asyhari

³

and

Chih-Chien Hu

⁴

¹

Department of Information and Library Science, Airlangga University, Surabaya 60115, East Java, Indonesia

²

Department of Natural Science Education, Hasyim Asy’ari University, Jombang 61471, East Java, Indonesia

³

Department of Data Science, Monash University (Indonesia Campus), Tangerang 15345, Banten, Indonesia

⁴

Department of Bussiness and Information Management, Tatung University, Zhongshan District, Taipei City 10491, Taiwan

^*

Author to whom correspondence should be addressed.

Heritage 2025, 8(8), 337; https://doi.org/10.3390/heritage8080337

Submission received: 31 May 2025 / Revised: 6 August 2025 / Accepted: 13 August 2025 / Published: 18 August 2025

Download

Browse Figures

Versions Notes

Abstract

Ancient documents that have undergone physical and visual degradation pose significant challenges in the digital recognition and preservation of information. This research aims to evaluate the effectiveness of ten classic binarization methods, including Otsu, Niblack, Sauvola, and ISODATA, as well as other adaptive methods, in comparison to the U-Net ResNet34 model trained on 256 × 256 image blocks for extracting textual content and separating it from the degraded parts and background of palm leaf manuscripts. We focused on two significant collections, Lontar Terumbalan, with a total of 19 images of Balinese manuscripts from the National Library of Indonesia Collection, and AMADI Lontarset, with a total of 100 images from ICHFR 2016. Results show that the deep learning approach outperforms classical methods in terms of overall evaluation metrics. The U-Net ResNet34 model reached the highest Dice score of 0.986, accuracy of 0.983, SSIM of 0.938, RMSE of 0.143, and PSNR of 17.059. Among the classical methods, ISODATA achieved the best results, with a Dice score of 0.957 and accuracy of 0.933, but still fell short of the deep learning model across most evaluation metrics.

Keywords:

binarization; palm leaf manuscripts; heritage; U-Net; ResNet; document image analysis

1. Introduction

Palm leaf manuscripts are a popular cultural heritage in Southeast and South Asia that recorded a wide range of subjects of knowledge, religious beliefs, and artistic and cultural expressions for centuries before the existence of printing systems [1,2]. Due to the large number of Indian speakers in the South Asian region, preservation and research efforts on palm leaf manuscripts continue to grow rapidly, supported by cultural and academic institutions that actively digitize and study their contents [3,4]. In contrast, in Southeast Asia, although palm leaf manuscripts are found in Sanskrit and Kawi, which have historical links to Indian traditions, active users of these languages are now very limited. Among the countries in this region, the Balinese still actively utilize and preserve the palm leaf manuscript tradition as part of their religious and cultural practices [5].

The preservation of palm leaf manuscripts is crucial due to the organic composition of the material and its vulnerability to degradation from age, environmental conditions, and incorrect handling, which make traditional conservation methods inadequate [6,7]. The complexity of handwritten script and the interaction between text and image in manuscripts add to the difficulty of interpretation [8,9]. Consequently, an improved strategy to address these challenges and facilitate the preservation of these unique cultural artifacts is urgently required [10,11,12]. One of the strategies is image binarization, a fundamental preprocessing step in document image analysis (DIA) studies that transforms grayscale images into binary form, separating foreground elements (e.g., text or illustration) from the background [13,14,15].

Numerous binarization techniques have been proposed to address manuscript degradation, including global and adaptive thresholding methods. These include Otsu’s method [16], Sauvola’s method [17], and Li’s minimum cross-entropy algorithms [18], which have been developed to accommodate various image characteristics and degradation levels [19]. The Niblack threshold [20], ISODATA [21], and K-means are also examined for binarizing some degraded documents and other ancient collections, such as inscriptions or temple reliefs. These classical methods have been used in several studies to improve the readability of historical collections and simplify automated analysis [22,23,24,25,26]. However, recent advancements in machine learning have introduced more flexible and data-driven approaches to binarization, especially in the context of degraded manuscripts.

In recent years, machine learning and binarization techniques have gained significant attention in document image analysis, especially for handling complex degradations. Unlike handcrafted thresholding rules, these methods learn pixel-wise classification directly from data, enabling more robust adaptation to various noise patterns and degradations. One of the most prominent convolutional neural network-based architectures is U-Net [27], which uses an encoder–decoder structure to perform semantic binarization tasks such as foreground–background separation. Variants and enhancements, including Fully Convolutional Networks (FCNs) [28], SauvolaNet [29], and DeepOtsu [30], have demonstrated high accuracy results in analyzing manuscript datasets such as DIBCO and ICFHR [31,32]. These models often outperform classical methods, especially when trained on domain-specific data or augmented with synthetic degradation.

In light of these developments, this study aims to compare various binarization methods by implementing the U-Net architecture with a ResNet34 encoder as a representative deep learning-based binarization model with ten classical approaches. This study tests the Lontar terumbalan, a Balinese palm leaf manuscript that has never been analyzed before. For comparison, this study also tested the AMADI Lontarset, which has been widely studied previously [29,33,34,35]. The manuscript’s complex visual structure and signs of degradation present unique challenges for digital analysis. To ensure objectivity in comparing these methods, this study uses quantitative evaluation metrics such as Intersection over Union (IoU) and Dice Coefficient to assess the overlap between the binarized output and the ground truth [36,37], while SSIM evaluates the perceived quality of the binarization [38]. These metrics allow for a robust assessment of binarization accuracy and perceptual quality, particularly when dealing with the unique challenges posed by historical manuscripts. By systematically applying and evaluating binarization methods, this study aims to determine the most effective approaches for binarizing and enhancing the manuscript’s content. The insights gained from this study not only enhance our understanding of Balinese manuscripts but also establish a framework for the analysis of other historical artifacts facing similar challenges.

To provide a comprehensive understanding of this study, each chapter of this paper is structured to guide the reader through the research process and findings systematically. The next chapter describes the datasets, the methodology used in this study, including preprocessing stages, and the ten thresholding approaches that were tested, along with the evaluation metrics utilized. The third chapter presents the experimental results and a comparative analysis of the thresholding outputs. Chapter 4 discusses their value in revealing the manuscript’s content. Finally, Chapter 5 summarizes major discoveries, discusses limits, and suggests future research areas in digital preservation and forensic examination of ancient manuscripts.

2. Materials and Methods

The primary objects of this research are two datasets of Balinese palm leaf manuscripts. The first is Lontar Terumbalan, a manuscript housed at the National Library of Indonesia [39]. These manuscripts serve not only as linguistic artifacts but also as visual records of cultural practices carried out by the community. This research involved 19 image datasets with a maximum dimension of 4450 × 403, as shown in Figure 1a. This dataset presents various visual challenges, including ink bleed, noise from leaf fibers, and some degraded areas, thus offering a realistic context for evaluating the robustness of binarization methods. Meanwhile, the second dataset is AMADI Lontarset, a standard palm leaf manuscript dataset compiled for the 2016 International Conference on Frontiers in Handwriting Recognition (ICFHR) competition [31,33]. This dataset also presents several visual challenges, such as ink bleed, as shown in Figure 1b.

To analyze the Balinese manuscript, this study focuses on the binarization of historical manuscript images as a crucial step for separating foreground content from complex and degraded backgrounds. In this context, binarization aims to convert the original color manuscript images into binary form, allowing meaningful visual components—such as symbolic illustrations and handwritten Balinese script—to be clearly distinguished from the background. As shown in Figure 1, the manuscript image features two main types of foreground content: symbolic illustrations and handwritten Balinese script. These elements are the primary focus of the binarization process, as they carry the meaningful visual and textual information embedded in the manuscript. Surrounding these components is a complex background composed of palm leaf textures, discoloration, and signs of physical aging. Therefore, accurate binarization is essential for enhancing the legibility of key elements while minimizing visual distractions from irrelevant background noise.

The proposed method consists of two main stages: image binarization using both classical and deep learning-based techniques and evaluation of the binarization results. As illustrated in Figure 2, the process begins with original manuscript images, which are directly subjected to various binarization techniques. The binarization methods applied include classical thresholding approaches such as Adaptive Mean, Sauvola, Global Otsu, Niblack, and others. Meanwhile, the deep learning-based model, U-Net with a ResNet34 encoder, was applied to represent the modern data-driven approach to foreground–background separation.

To evaluate the accuracy of each binarization output, a set of ground truth images was prepared using different annotation strategies for the two datasets. As shown in Figure 2, the dashed lines connecting the ground truth images to the binarization evaluation module illustrate the supportive role of ground truth data in the overall workflow as a reference standard for comparing the binarized outputs during evaluation. This visual distinction emphasizes the function of ground truth images as an external benchmark, crucial for validating the binarization performance without influencing the core processing pipeline.

For the Lontar Terumbalan, ground truth masks were generated through a semi-automated process that combined several enhancement techniques. As shown in Figure 3, the preparation of ground truth involved grayscale conversion, median filtering, CLAHE, and Sauvola thresholding, followed by manual correction of the binarized regions. In contrast, the AMADI dataset came with pre-existing ground truth annotations, which were created manually and provided by the original dataset contributors.

2.1. Binarization

Binarization is widely used in document image analysis as a preprocessing method that transforms grayscale or color images into two groups of pixels to separate the foreground from the background, specifically some crafted text and objects from the palm leaf manuscript background to facilitate further analysis, such as object identification, line segmentation, character recognition, and classification, and also to support digital preservation [40,41]. In this study, ten thresholding techniques, as classical binarization methods, were applied, including Otsu, Otsu with Morphological Processing (Otsu Morph), Adaptive Mean, Adaptive Gaussian, Sauvola, Niblack, Li’s minimum cross-entropy, ISODATA, K-means Clustering, and Global Otsu. Furthermore, the deep learning-based binarization approach using U-Net with a ResNet34 encoder was also employed and compared to assess their performance. The diversity in these methods enables a robust comparison of their performance on visually degraded palm leaf manuscripts. Each method is described further below.

2.1.1. Classical Binarization Techniques

Niblack

Niblack threshold is a method that calculates the threshold for each pixel based on the mean (

μ

) and standard deviation (

σ

) of the pixel values within a local window around it [20,42]. This threshold (

t

) at pixel

(x, y)

is defined in Equation (1) [43].

t (x, y) = μ (x, y) + k \cdot σ (x, y)

(1)

In this method, the bias setting or boundary pixels of the object (

k

) is set to −0.2, and the good neighborhood choice is 15 × 15 [14]. Niblack is more prone to noise amplification, particularly in areas with high background variation. Despite this limitation, it is still capable of extracting detailed character structures when used on well-contrasted sections of the manuscript.

2.: Li’s Minimum Cross-Entropy

Li’s minimum cross-entropy method, originally proposed by Li et.al, seeks to minimize the cross-entropy between the original grayscale image and its binarized version through an iterative update process [18,44]. This method leverages the minimization of Kullback–Leibler divergence to provide practical solutions under constraints, enhancing accuracy and efficiency in different contexts [45,46]. The method begins by dividing the image histogram into two classes, background (below the threshold) and foreground (above the threshold). For each candidate threshold

t

, the cross-entropy function

H (t)

is computed based on the zero-order and first-order moments of the histogram, as shown in (2).

H (t) = \sum_{i = 0}^{t} h (i) \log μ_{a} (t) + \sum_{i = t + 1}^{L} h (i) \log μ_{b} (t)

(2)

Here, the

h (i)

denotes the histogram count at intensity level

i

,

μ_{a} (t)

and

μ_{b} (t)

are the mean intensities of the background and foreground classes, and

L

is the maximum gray level (typically 255 for 8-bit images). The goal is to find the threshold that minimizes this entropy function, which can be formally written as (3).

t^{*} = \arg \min_{t} H (t)

(3)

To enhance computational efficiency, the algorithm employs an iterative formulation that eliminates the need for evaluating all possible intensity levels [44]. The threshold is progressively updated based on (4).

t_{n + 1} = r o u n d (\frac{μ_{b}^{(b)} - μ_{a}^{(b)}}{\log μ_{b}^{(b)} - \log μ_{a}^{(b)}})

(4)

This process is repeated until convergence, typically when the difference between successive thresholds is negligible. The variables

μ_{a}^{(b)}

and

μ_{b}^{(b)}

represent the mean intensities of the background and foreground classes at iteration

n

. The use of logarithmic terms in the denominator ensures that the threshold is sensitive to entropy variations across the two classes, making it particularly effective for handling visual degradations and soft intensity transitions found in ancient manuscript images.

3.: Iterative Self-Organizing Data Analysis Technique (ISODATA)

ISODATA threshold calculates a threshold iteratively by partitioning the histogram into two classes and computing the average intensity of each [21]. Although it lacks a formal closed-form equation, it involves repeating mean calculations until the threshold value stabilizes [47]. In this method, the threshold (

t

) output can be obtained from (5), where

μ_{0}

and

μ_{1}

represent the means of each histogram separated by the threshold [47].

t_{n + 1} = \frac{μ_{0} + μ_{1}}{2}

(5)

4.: K-Means

The K-means Clustering threshold uses an unsupervised learning algorithm to assign pixels into two intensity-based clusters. The algorithm minimizes the intra-cluster variance through centroid updates and is robust to a range of intensity distributions. However, it is sensitive to initial cluster centers and may misclassify pixels in homogeneously textured areas. The K-means threshold is shown in (6) [48].

L = \sum_{i = 1}^{n} \sum_{j = 1}^{k} | x_{i} - c_{j} |^{2}

(6)

where

L

represents the total loss or compactness of the clustering,

n

is the number of data points, and

k

is the number of clusters. In this equation,

x_{i}

represents the

i

-th data point in the dataset, while denotes the centroid of the

j

-th cluster. The term

| x_{i} - c_{j} |^{2}

calculates the squared Euclidean distance between the data point and the cluster center. The double summation indicates that the algorithm considers the distance from every point to every cluster center, although in practice, each point is only assigned to its closest center.

5.: Adaptive Mean

The Adaptive Mean calculates the threshold value for each pixel based on the mean intensity of its local neighborhood. The threshold is computed using Equation (7):

T (x, y) = μ_{N} (x, y) - C

(7)

In the Adaptive Mean thresholding method, the threshold value for each pixel is calculated based on the average intensity of the pixels within a local neighborhood window surrounding that pixel.

μ_{N} (x, y)

refers to the mean grayscale value of all pixels in the local window

N

centered at position

(x, y)

. To fine-tune the threshold and increase its robustness against slight background variations or noise, a constant value

C

is subtracted from the local mean. This constant acts as a bias and can be adjusted depending on the characteristics of the image.

As a result, the final threshold

T (x, y)

dynamically adapts to local intensity variations rather than relying on a single global value. This method is more flexible than global thresholding because it adapts to local image conditions, especially in images with uneven lighting or complex backgrounds [49]. It is particularly useful in documents with uneven illumination because it provides better visual quality and lower misclassification errors but may still be sensitive to local noise or structural inconsistencies in the manuscript surface [50].

6.: Adaptive Gaussian

Adaptive Gaussian operates similarly to Adaptive Mean but uses a weighted Gaussian kernel instead of a uniform one [51]. Its threshold computation is defined in Equation (8):

T (x, y) = G_{σ} * I (x, y) - C

(8)

Instead of treating all pixels in the neighborhood equally, this approach assigns higher importance to pixels closer to the center of the window. The Gaussian kernel is denoted by

G_{σ}

, where

σ

controls the spread or standard deviation of the Gaussian distribution. The convolution of this kernel with the input image

I

, expressed as

G_{σ} * I (x, y)

, yields a weighted local mean that better preserves gradual intensity changes, such as those caused by lighting gradients or ink fading. A constant

C

is then subtracted to adjust the sensitivity of the thresholding. The Gaussian weighting improves the method’s robustness to gradual intensity changes, making it effective for palm leaf manuscripts where ink fading and lighting gradients are present. However, it may still struggle with high-contrast edges or extremely faint text regions.

7.: Otsu

Otsu is a reimplementation of the Global Otsu method using floating-point arithmetic, improving precision in datasets with subtle grayscale variations. While its conceptual foundation is identical to the original Otsu, it enables more nuanced binarization in high-resolution images such as those in this study. The Otsu binarization defined in (9).

t = \frac{{σ (t)}_{b}^{2}}{σ_{t}^{2}}

(9)

8.: Global Otsu

Global Otsu determines a single global threshold that minimizes intra-class variance in intensity distribution [21]. It computes the optimal threshold by minimizing the weighted sum of within-class variances, as shown in Equation (10).

{σ (t)}_{b}^{2} = ω_{1} (t) ω_{2} (t) {[μ_{1} (t) - μ_{2} (t)]}^{2}

(10)

where

ω_{0}

and

ω_{1}

are the probabilities of the two classes separated by threshold

t

, and

μ_{0}

and

μ_{1}

are the means of those classes. The threshold

t

that maximizes

{σ (t)}_{b}^{2}

is selected. This method is efficient and well suited for images with bimodal histograms where foreground and background intensities are well separated.

ω_{0}

is the probability that the data can be below the threshold, and

ω_{1} = 1 - ω_{0}

is the probability that the data can be above the threshold [21].

9.: Otsu Morph

Otsu Morph enhances the output of the basic Otsu method by applying morphological operations. Morphological opening followed by closing removes small artifacts and bridges fragmented text strokes, improving readability. The process to obtain the Otsu Morph output

I_{m o r p h}

is defined in Equation (11):

I_{m o r p h} = (I_{o t s u} \oplus J) ⊖ J

(11)

In this process,

I_{o t s u}

is the binary image result from multilevel Otsu thresholding, and

J

is the structuring element that defines the shape and size of the local neighborhood. The symbol

\oplus

denotes dilation, which expands the foreground region by adding pixels to the object boundaries, while

⊖

represents erosion, which trims away noise and sharpens the object’s shape. This combination helps to close small gaps, connect broken lines, and reduce noise in the background [52].

10.: Sauvola

Sauvola’s method adapts the threshold using both the local mean (

μ

) and standard deviation (

σ

) to account for illumination variability [17]. It is formulated in Equation (12):

T (x, y) = μ_{N} (x, y) [1 + k (\frac{σ_{N} (x, y)}{R} - 1)]

(12)

In this equation,

μ_{N} (x, y)

is the local mean,

σ_{N} (x, y)

is the local standard deviation,

k

is a tunable parameter, typically set between 0.2 and 0.5 (commonly 0.5), and

R

represents the dynamic range of the standard deviation, usually set to 128 or 255. The default window size

N

for local calculation is typically set to 15 × 15 pixels, which balances the need to capture sufficient local context without over-smoothing fine text structures. This method was originally developed for degraded document images and has proven highly effective in dealing with textured or stained backgrounds. This study showed particular strength in preserving fine character details, even under inconsistent contrast conditions.

2.1.2. Deep Learning Binarization Techniques

In this study, a deep learning-based binarization approach was implemented using the modified U-Net architecture with a ResNet34 encoder as the backbone. U-Net, originally designed for biomedical image binarization and segmentation, is particularly effective in historical document binarization due to its encoder–decoder structure and skip connections that allow the transfer of detailed spatial information across the network [53]. This architecture is essential for preserving thin and degraded strokes typical in palm leaf manuscripts. Unlike standard U-Net models that employ a simple convolutional encoder, the implementation in this study substitutes the encoder with a pretrained ResNet34, a deeper and more expressive architecture. ResNet34 utilizes residual learning through skip connections between layers to maintain gradient flow during training, allowing deeper and more stable learning.

The proposed methodology employs a systematic preprocessing stage followed by deep learning-based binarization. Initially, the input image undergoes a padding operation to facilitate non-overlapping cropping by adding some pixels. This process is a cropping operation into 256 × 256 processing blocks. This preprocessing stage ensures that all generated image blocks maintain the correct dimensions according to our network architecture, as visually shown in Figure 4. From this process, we generate a number of more detailed datasets in Table 1. The dataset is then divided into train, validation, and test data in an 80:10:10 ratio to be tested.

Following preprocessing, each standardized 256 × 256 × 3 image block undergoes binarization through a U-Net architecture model with a ResNet34 encoder, as shown in Table 2. The operation was completed according to the configuration and training parameters, as summarized in Table 3. The model was trained on a computer with an AMD Ryzen 7 3700X 8-Core Processor and 32 GB of RAM, using a batch size of 4. It was trained for 25 epochs and produced binary masks of size 256 × 256 × 1. The final layer uses a sigmoid activation function to classify each pixel as foreground (text) or background.

2.2. Quantitative Evaluation Metrics

This study employed seven evaluation metrics to comprehensively assess each thresholding method’s performance in binarizing Balinese palm leaf manuscripts. These metrics are categorized into three groups based on their analytical orientation: first, the pixel-wise classification metrics derived from the confusion matrix; second, a structure-based metric that evaluates perceptual similarity; and third, pixel-wise error metrics that measure numerical differences between prediction and ground truth. Each of the following metrics provides a different perspective on binarization quality, making them suitable for a well-rounded evaluation.

2.2.1. Confusion Matrix-Based Metrics

Confusion matrix-based metrics quantify how accurately the binarization algorithm classifies each pixel in the image as either foreground (text) or background (non-text). These include Intersection over Union (IoU), Dice Coefficient, accuracy, and Recall. Each of these metrics is calculated based on four fundamental classification outcomes. First, the true positives (

T P s

) refer to correctly identified text pixels, while true negatives (

T N s

) denote correctly identified background pixels. False positives (

F P s

) occur when background pixels are incorrectly labeled as text, and False negatives (

F N s

) represent missed text pixels [54]. These components form the basis for computing pixel-level performance and allow for detailed evaluation of both over-binarization and under-binarization in document image analysis.

Intersection over Union

The Intersection over Union (IoU) is also known as the Jaccard Index. It determines the amount of overlap between the binarized output and the ground truth. IoU is defined as the ratio of the intersection of the expected and actual foregrounds to their union. A high IoU implies that the binarization result closely matches the real character regions. Mathematically, this is expressed in (13) [36,37].

I o U = \frac{T P}{T P + F P + F N}

(13)

2.: Dice Coefficient

The Dice Coefficient offers another perspective on region similarity. Dice Coefficient also quantifies the overlap but provides more weight to true positives. It is considered more sensitive than IoU when dealing with small or thin foreground regions, such as handwritten characters. Dice is especially important when evaluating binarization performance on delicate character strokes, where even small mismatches can result in loss of semantic meaning. The formula is shown in (14).

D i c e = \frac{2 T P}{2 T P + F P + F N}

(14)

3.: Accuracy

Accuracy measures the overall proportion of correctly classified pixels. It accounts for both foreground (text) and background (non-text) regions and is useful for understanding general binarization correctness. While intuitive and easy to compute, accuracy can be misleading in datasets with class imbalance, such as manuscripts where background pixels dominate. It is defined in Equation (15).

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(15)

4.: Recall

Recall is also referred to as the sensitivity or true positive rate. It evaluates how many of the actual text pixels were correctly identified by the binarization algorithm. In forensic contexts, high Recall is desirable because missing even faint or partially degraded characters can result in a loss of critical historical or linguistic information. It is defined in Equation (16).

R e c a l l = \frac{T P}{T P + F N}

(16)

2.2.2. Structure-Based Similarity Metrics

In addition to pixel-wise evaluation, this study also applied a set of structure-based similarity metrics to assess the perceptual and structural similarity between the binarized output and the ground truth. These metrics provide insights beyond simple classification accuracy by considering how well the visual and spatial properties of the original manuscript are preserved. In digital forensics, such structure-aware evaluation is crucial for ensuring that recovered texts retain their shape, continuity, and readability, especially in historical documents like palm leaf manuscripts.

Structural Similarity Index Measure

The Structural Similarity Index Measure (SSIM) is used to assess the perceptual similarity between the binarized output and the ground truth. Unlike classification metrics, the SSIM considers local luminance, contrast, and structure to model human visual perception. This is critical in digital forensics, where accurate recovery of text shape, stroke continuity, and layout is essential. The SSIM is calculated over local windows in the image and is defined in (17) [55].

S S I M (x, y) = \frac{({2 μ}_{x} μ_{y} + C_{1}) ({2 σ}_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(17)

Here,

μ_{x}

and

μ_{y}

represent the means,

σ_{x}^{2}

and

σ_{y}^{2}

the variances, and

σ_{x y}

the covariance of the predicted and ground truth images. The constants

C_{1}

and

C_{2}

are used to stabilize the computation in the presence of low denominators. SSIM values range from 0 to 1, with values closer to 1 indicating higher structural similarity [56,57].

2.: Feature Similarity Index Measure

The Feature Similarity Index Measure (FSIM) evaluates perceptual similarity using two human-sensitive features: phase congruency (PC) and gradient magnitude (GM). FSIM assesses how well important visual structures like edges, strokes, and lines are preserved in the binarized image. It is defined in (18):

F S I M = \frac{\sum_{x \in Ω} {P C}_{M} (x) . S_{L} (x)}{\sum_{x \in Ω} {P C}_{M} (x)}

(18)

In this formula,

{P C}_{M} (x)

represents the maximum phase congruency between the reference and test images at pixel

x

, and

S_{L} (x)

is the combined similarity of phase congruency and gradient magnitude at that pixel. FSIM values also range from 0 to 1, with scores above 0.70 considered excellent for structural preservation, particularly for handwritten characters with fine strokes and curves [58].

3.: Multiscale Structural Similarity Index Measure

The Multiscale Structural Similarity Index Measure (MS-SSIM) extends the original SSIM by evaluating image similarity at multiple scales, which better reflects the way humans perceive structural information across different resolutions. At each scale, contrast and structure similarities are measured, while luminance is only assessed at the coarsest scale [59]. The MS-SSIM is described in (19).

M S - S S I M (x, y) = {[l_{M (x, y)}]}^{α_{M}} \cdot \prod_{j = 1}^{M} {[c_{j (x, y)}]}^{\{β_{j}\}} {[s_{j (x, y)}]}^{γ_{j}}

(19)

In the MS-SSIM formula, the variables play distinct roles in assessing the structural similarity between two images at multiple scales. The terms

x

and

y

represent the two image signals being compared, typically the reference (original) image and the test (distorted or processed) image. The total number of scales used in the analysis is denoted by

M

, with the image progressively downsampled at each scale. At the coarsest scale,

M

, luminance similarity is evaluated using the function

l_{M (x, y)}

, which measures how closely the average brightness levels of the two images match. For each scale

j

(where

j

= 1, 2, …,

M

), the contrast similarity is represented by

c_{j (x, y)}

, which compares the standard deviations (or local contrast) of the images, while

s_{j (x, y)}

denotes the structural similarity that captures the correlation between local patterns or textures.

Each component, including luminance, contrast, and structure, is raised to an exponent

α_{M}

,

β_{j}

, and

γ_{j}

, which acts as a weight to control its contribution to the final MS-SSIM score. These weights are empirically chosen to reflect the perceptual importance of each component at different scales. The contrast and structure terms are aggregated multiplicatively across all scales using a product operation, providing a comprehensive and hierarchical measure of visual similarity that aligns with human perception.

4.: Gradient Magnitude Similarity Deviation

The gradient magnitude similarity deviation (GMSD) is a dissimilarity-based metric that captures structural distortion by comparing the gradient magnitudes between two images. It is particularly sensitive to edge and contour inconsistencies—key elements in character binarization. GMSD is computed as the standard deviation of the gradient magnitude similarity map, defined in (20):

G M S D = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(G M S (i) - μ_{G M S})}^{2}}

(20)

where

G M S (i)

is the gradient magnitude similarity at pixel

i

, and

μ_{G M S}

is the mean gradient magnitude similarity. The value of GMSD increases as structural distortion grows, meaning lower values are better. Typically, values below 0.35 are interpreted as structurally accurate, while values above 0.45 indicate significant deviations in edge alignment or stroke consistency [60].

Together, SSIM, FSIM, MS-SSIM, and GMSD offer a comprehensive evaluation framework that captures both human perceptual quality and structural correctness. Their combined use ensures a well-rounded assessment of the binarization performance, especially important in the forensic processing of ancient scripts, where maintaining visual authenticity is critical.

2.2.3. Pixel-Wise Error-Based Metrics

Pixel-wise error metrics evaluate the quantitative difference between the binarized image and the ground truth on a per-pixel basis. Unlike classification-based or structure-based metrics, these measurements do not rely on semantic structure or perceptual modeling. Instead, they directly assess the numerical fidelity of each pixel, making them suitable for identifying fine-grained discrepancies, especially in low-level vision tasks such as binarization, denoising, and image restoration. In this study, two commonly used metrics, the Root Mean Square Error and Peak Signal-to-Noise Ratio, were utilized to measure the pixel-wise quality of binarization results.

Root Mean Square Error

The Root Mean Square Error (RMSE) quantifies the average magnitude of pixel intensity differences between the predicted (binarized) image and the ground truth image. It is sensitive to large errors due to the squaring operation and is widely used for assessing reconstruction accuracy in grayscale image processing tasks. The results of the RMSE evaluation are described by (21).

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(P_{i} - G_{i})}^{2}}

(21)

In this formula,

N

denotes the total number of pixels in the image. The value

P_{i}

refers to the pixel intensity at position

i

in the predicted (binarized) image, while

G_{i}

represents the corresponding pixel value at the same position in the ground truth image. The difference between

P_{i}

and

G_{i}

is squared to compute the error for each pixel. These squared errors are then averaged, and the square root of this mean yields the Root Mean Square Error (RMSE) value.

2.: Peak Signal-to-Noise Ratio

The Peak Signal-to-Noise Ratio (PSNR) evaluates the quality of a reconstructed or processed image by comparing the maximum possible pixel intensity value with the error introduced by the prediction. It is expressed in decibels (dB), and higher values indicate better image fidelity. The PSNR value is defined in (22).

P S N R = 20 \cdot \log_{10} ({M A X}_{1}) - 10 \cdot \log_{10} (M S E)

(22)

where

{M A X}_{1}

is the maximum possible pixel intensity value in the image for 8-bit grayscale images; this value is typically 255. The Mean Square Error (

M S E

) is defined in (23).

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(P_{i} - G_{i})}^{2}

(23)

To summarize the evaluation metrics, Table 4 presents the value ranges and interpretations used for assessing binarization performance. Metrics such as IoU, Dice, Recall, and accuracy measure the overall binarization accuracy, while RMSE and PSNR evaluate pixel-wise reconstruction precision. Moreover, structure-aware metrics, including SSIM, FSIM, MS-SSIM, and GMSD, assess the preservation of textual details and gradient consistency. By outlining the expected value ranges and their corresponding performance implications, this table provides a comprehensive reference for evaluating both classical and deep learning binarization models, forming the basis for the comparative analysis conducted in this study.

3. Results

This section presents the results of the binarization process applied to Balinese palm leaf manuscript images. The results are divided into three main parts. The first part (Experiment 1) presents a visual analysis of ten classical thresholding methods. The second part (Experiment 2) shows the binarization results obtained using a deep learning-based method, specifically a U-Net model with a ResNet34 backbone. Finally, the third part provides a comparative evaluation of all methods using three groups of performance metrics: Confusion matrix-based metrics, structure-based similarity metrics, and image quality metrics.

3.1. Experiment 1: Classical Binarization

In Experiment 1, ten different classical binarization methods were applied to Balinese palm leaf manuscripts using two datasets: Lontar Terumbalan and AMADI Lontarset. Figure 5 illustrates the visual results of each method for both datasets. These visualizations allow for a qualitative comparison of how each method processes the unique characteristics of the palm leaf manuscripts, such as degradation due to aging, damaged or eroded surfaces, and the presence of faint or broken character strokes.

Among the results, Global Otsu, ISODATA, Otsu, and K-means methods tend to deliver visually balanced outputs. These techniques preserve the shape of the characters with relatively uniform binarization, making the main strokes appear clear and well defined across various manuscript conditions. These methods generally handle background variations quite well and maintain the structure of both text and illustrations, although some noise or slight stroke thinning is still occasionally noticeable.

3.2. Experiment 2: Deep Learning-Based Binarization

Experiment 2 focuses on binarization results using a deep learning approach. Specifically, a U-Net architecture with a ResNet34 encoder was trained on a dataset of annotated palm leaf manuscript images. The model was designed to learn complex spatial features that distinguish text from background, including variations in stroke thickness, faded ink, and uneven lighting. The visual output of the U-Net model is presented in Figure 6. Compared to the classical methods, the deep learning-based binarization shows a significant improvement in preserving fine character details while minimizing background noise. The model effectively restores discontinuous strokes and maintains the overall consistency of the characters’ shapes, even in challenging regions where traditional thresholding methods fail. In areas with damaged surfaces or non-uniform illumination, the U-Net model demonstrates a higher degree of adaptability, making it a promising tool for automated character extraction in historical manuscripts.

3.3. Evaluation of Binarization Result

A quantitative evaluation of binarization performance was carried out using three kinds of evaluation metrics. First, an evaluation metrics-based confusion matrix, which included Intersection over Union (IoU), Dice Coefficient, Recall, and accuracy metrics. The evaluation was conducted by comparing the binarized outputs of each method against their corresponding ground truth masks. The assessment values on the Lontar Terumbalan in Table 5 and AMADI Lontarset in Table 6 reveal critical insights into their effectiveness across multiple evaluation metrics. Comparing the classic and deep learning methods and U-Net model with a ResNet34 encoder demonstrates exceptional performance, achieving a near-perfect IoU value of 0.972 in Lontar Terumbalan and 0.971 in AMADI Lontarset, both indicating a very high degree of overlap with the annotated ground truth. These are further reinforced by Dice Coefficients of 0.986 and 0.985, suggesting excellent spatial agreement and robustness of the deep learning approach, even across datasets with different characteristics.

In contrast, the classical methods exhibit greater performance variability, both between datasets and within the same dataset across metrics. For instance, on Lontar Terumbalan, Global Otsu, ISODATA, and K-means perform relatively well, each surpassing an IoU of 0.900. However, methods like Adaptive Gaussian and Niblack perform poorly, achieving IoU scores of only 0.575 and 0.516, respectively, suggesting that these methods are more sensitive to noise and local contrast variations commonly found in palm leaf manuscripts. Interestingly, the results on the AMADI Lontarset dataset in Table 6 highlight how dataset-specific features can influence binarization efficacy. For example, ISODATA continues to perform strongly (IoU: 0.925), and Sauvola also improves (IoU: 0.879), outperforming Global Otsu (IoU: 0.749), despite its good performance on the previous dataset. These shifts suggest that some classical methods may generalize poorly when faced with manuscripts of different scanning qualities, degradation levels, or texture complexity.

Beyond IoU, Recall and accuracy metrics provide additional insights into each method’s ability to correctly identify foreground pixels. The U-Net model excels with Recall values of 0.983 and 0.986, demonstrating minimal false negatives. Among classic methods, Sauvola achieves a high Recall of 0.978 on AMADI Lontarset, suggesting strong foreground detection. Conversely, Niblack’s low Recall (0.730 and 0.678) highlights its tendency to miss true foreground pixels, particularly in noisy regions. Accuracy further reinforces these trends, with the U-Net model outperforming all others (0.983 and 0.975), while Adaptive Gaussian and Niblack struggle (accuracy: 0.604 and 0.648), underscoring their limitations in complex document binarization.

The training performance of the U-Net model with the ResNet34 encoder is clearly demonstrated through the IoU and loss curves in Figure 7 for the Lontar Terumbalan and AMADI Lontarset datasets. Both IoU curves in Figure 7a,b show a rapid increase in binarization accuracy, each reaching near-perfect values above 0.97, demonstrating the model’s excellent ability to capture manuscript details. Meanwhile, the loss curves in Figure 7c,d complement this finding, showing a sharp initial decline and resulting in a minimum value below 0.1, reflecting effective optimization. Although both datasets exhibit similar trends, Lontar Terumbalan achieves slightly more stable and faster convergence without fluctuations, likely due to differences in image quality or complexity between the datasets. Meanwhile, the training process for the AMADI Lontarset experiences significant fluctuations between epochs 5 and 10. These results collectively validate the robustness of the U-Net model for document binarization tasks.

In addition to pixel-level accuracy, this study also employed structural similarity metrics to assess the perceptual and visual fidelity of the binarized outputs. Table 7 summarizes the results of SSIM, FSIM, MS-SSIM, and GMSD in the Lontar Terumbalan dataset, while Table 8 represents AMADI Lontarset results. These metrics assess not only pixel accuracy but also the preservation of visual structure, texture, and edge consistency, which are critical for script legibility. As shown in Table 7 and Table 8, the U-Net model achieves outstanding scores in all metrics, such as an SSIM with values of 0.938 and 0.883, indicating near-perfect structural retention. Classic methods like Otsu and ISODATA perform moderately, while Adaptive Gaussian and Niblack yield poor structural preservation. Notably, GMSD (gradient magnitude similarity deviation), which measures distortion severity, shows the U-Net model’s lowest values of 0.195 and 0.206, confirming minimal structural degradation. In contrast, Niblack exhibits the highest GMSD in both datasets, suggesting noticeable artifacts in the binarized output.

Complementing the previous evaluation metrics, pixel-wise error metrics (RMSE, PSNR) are widely used in document image analysis (DIA) studies to quantitatively measure the binarization accuracy at the pixel level. In this evaluation (Table 9), the U-Net model with the ResNet34 encoder recorded the best performance, with the lowest RMSE with scores of 0.143 and 0.161 and the highest PSNR with scores of 17.059 and 16.400, outperforming all ten classical methods. Methods such as K-means, ISODATA, and Global Otsu showed competitive RMSE in the range of 0.257 and 0.258 on Lontar Terumbalan, but their consistency decreased in the AMADI Lontarset results, with RMSE values of 0.244 to 0.396. Meanwhile, Niblack produced the highest RMSE on both datasets and the lowest PSNR with a value of 3.613, indicating significant pixel classification errors, especially in areas with uneven illumination or noise. These findings reinforce the superiority of deep learning approaches in handling complex variations in ancient manuscripts, while also revealing the limitations of classical methods in maintaining pixel fidelity under challenging document conditions.

To facilitate a clearer understanding of how each method performs across multiple evaluation metrics, a comparative visualization is provided to support the quantitative findings. As illustrated in Figure 8, a line chart summarizes the binarization performance of all evaluated methods using various metrics. In Figure 8a, which represents results on the Lontar Terumbalan dataset, the U-Net-based ResNet34 model consistently outperforms all classical methods, with the highest scores in IoU, Dice, Recall, accuracy, and structural similarity metrics such as SSIM and FSIM. This performance is further supported by the lowest RMSE and high PSNR, indicating both structural and pixel-level accuracy. Among the traditional techniques, Global Otsu, ISODATA, and K-means demonstrate competitive results in core metrics (e.g., IoU, Dice), though they fall short in SSIM and GMSD. On the other hand, local thresholding methods such as Adaptive Gaussian, Adaptive Mean, and Niblack show noticeable underperformance, especially in SSIM and RMSE, suggesting a lack of robustness in handling the specific degradation characteristics found in Lontar Terumbalan.

To further validate these findings across a different dataset, Figure 8b presents the same set of evaluation metrics applied to the AMADI Lontarset. This comparison not only reinforces the superiority of the U-Net model but also highlights how dataset-specific characteristics may affect the relative performance of classical methods. Once again, U-Net dominates in nearly all metrics, reinforcing its superiority in both spatial alignment and perceptual similarity. Interestingly, performance rankings of the classical methods show some variation compared to the previous dataset. For instance, ISODATA and Sauvola outperform Global Otsu in IoU and Dice scores, indicating that the underlying image characteristics of AMADI Lontarset may be more suited to these methods. The consistency of U-Net’s high performance across both datasets highlights the model’s adaptability and reliability, while the shifting effectiveness of classical methods emphasizes the influence of dataset-specific noise, contrast, and texture properties on binarization outcomes.

4. Discussion

The experimental results demonstrate that our U-Net with ResNet34 encoder establishes new state-of-the-art performance for palm leaf manuscript binarization, achieving superior metrics across all evaluation criteria. While some classical binarization techniques, such as ISODATA, Otsu, K-means, and Sauvola, have demonstrated effectiveness under certain conditions, their performance remains inconsistent and less robust when compared to the consistently high-performing U-Net architecture. The classical methods in this study showed their lack of robustness in handling complex degradation, such as mold stains, ink bleeds, and textured backgrounds of palm manuscripts. In contrast, the combination of U-Net’s strengths in spatial localization and ResNet34’s strengths in deep feature extraction makes this architecture well suited for separating text from inhomogeneous backgrounds. This finding is consistent with a study by Bipin N.B.J. et al., which showed that U-Net outperformed Sauvola Net and a Semantic Deep Binarization Network in preserving fine text details in ancient documents [29]. Furthermore, a study by Wang et al. (2023), which introduced a U-Net-based PLM-SegNet for segmenting palm leaf manuscripts, achieved pixel-wise accuracy of up to 99.73% and demonstrated a 19.33% improvement in the IoU metric for damage detection after segmentation, demonstrating the effectiveness of deep learning approaches in this context [53].

The reliability of ResNet as an encoder in document binarization models was also confirmed in research that modified the ResNet architecture by adding batch normalization. This study demonstrated that this modification not only increased training speed but also improved accuracy in removing complex noise such as ink bleed-through and uneven lighting, achieving binarization accuracy of up to 95.38% on palm leaf manuscripts [61]. Furthermore, a comparison between ResNet and U-Net on palm manuscript documents also demonstrated that both were effective in handling severe degradation, with high scores on F-Measure, PSNR, and accuracy metrics [62]. These studies reinforce the belief that the ResNet architecture is capable of modeling both the local and global structure of text in historical documents, making it an ideal choice as part of a deep learning-based binarization pipeline. In this research, ResNet34 provided a robust foundation for U-Net to extract meaningful features from complex and textured backgrounds.

Moreover, the combination of U-Net and ResNet was further validated by Rani et al., who proposed a lightweight model called PLM-Res-U-Net [63]. This model, specifically designed for palm leaf manuscripts, consists of encoder–decoder blocks with skip connections and demonstrates excellent performance in preserving text edge strokes and addressing severe degradation, such as background discoloration, brittleness, and aging artifacts. PLM-Res-U-Net achieved a Dice score of 0.986 on two benchmark datasets and outperformed several other state-of-the-art models, such as U-Net++, PSPNet, and SegNet. Notably, this Dice score is identical to the one achieved in our Lontar Terumbalan test. However, our model reached these high scores with only 25 epochs, making it more practical and efficient for real-world applications. This is particularly beneficial for cultural heritage institutions such as museums and preservation communities in Bali, where computational resources are often limited. These findings highlight that combining U-Net’s spatial capabilities with ResNet’s residual learning can yield binarization models that are both accurate and computationally efficient.

Despite these advantages, the ResNet34 U-Net architecture certainly has some shortcomings in its implementation. One critical challenge is the potential loss of fine spatial details, especially in degraded or textured regions, which can result in blurry or inaccurate segmentation of thin or faded characters. This limitation is particularly evident in patch-wise binarization, where there is a tradeoff between effective noise reduction and preservation of edge details, as also noted by Rani et al., who observed that despite their model’s strong denoising capabilities, it could not eliminate all types of noise equally across manuscripts due to the variation in texture and quality of palm leaf images [63]. Furthermore, the model is also computationally intensive, requiring substantial memory and processing power during training and inference, which can be a barrier to implementation in resource-constrained environments, such as small cultural heritage institutions or on-site digitization efforts.

These limitations underscore the continued relevance of classical binarization methods, which cannot be entirely disregarded. Techniques like Niblack and Sauvola remain valuable due to their simplicity, ease of application, and independence from training data. In some cases, these methods can even highlight fine, important textual details that may be overlooked by deep models. This is because classical approaches operate locally and adjust thresholds based on surrounding pixel intensity. Therefore, in the context of digital preservation or early-stage document restoration, classical methods are still relevant, both as an initial step and as a complement to deep learning models. Combining classical and deep learning methods can also be an effective strategy, for example, in preprocessing or result validation. By considering the strengths and limitations of each, a hybrid approach may produce more optimal results under varying document degradation conditions. Thus, although the U-Net ResNet34 model shows superior overall performance, classical methods still play a valuable role in historical document processing.

5. Conclusions

This study has explored and benchmarked classical binarization methods for binarizing Balinese palm leaf manuscripts, alongside deep learning approaches, particularly the U-Net with a ResNet34 encoder. The results demonstrate significant advantages for preserving historical manuscripts compared to traditional binarization techniques. The model effectively addresses key challenges in document preservation, including faded ink, stains, and uneven lighting, while maintaining the integrity of delicate character strokes. Unlike traditional binarization methods, which often struggle with inconsistent degradation patterns, the U-Net with ResNet34 approach offers superior performance across various manuscript conditions and script types. Moreover, our model achieved the highest performance score with fewer training epochs, demonstrating both accuracy and computational efficiency, which is an important consideration for practical use in heritage preservation contexts with limited resources. These advancements open new possibilities for analyzing and interpreting ancient texts that were previously difficult to process.

Moving forward, the binarization results from this study could serve as a foundation for several important research directions. The cleaned manuscript images could be used in further research, such as in automated text recognition systems to transcribe historical documents, especially in Balinese character recognition. The cleaned manuscript could also support advanced document analysis tasks such as damaged text reconstruction, script style classification, or even chronological estimation based on degradation patterns, which are related to digital image forensic studies. Such developments would transform binarization from merely a preprocessing step into an integral component of comprehensive digital preservation systems, ultimately enhancing our ability to study and protect cultural heritage.

Author Contributions

Conceptualization, I.Y., K.N. and A.T.A.; methodology, K.N.; validation, K.N., N.U.N. and Y.A.H.; resources, I.Y.; data curation, K.N.; writing—original draft preparation, K.N.; writing—review and editing, I.Y., N.U.N., A.T.A. and C.-C.H.; visualization, K.N.; supervision, I.Y., A.T.A., C.-C.H. and Y.A.H.; project administration, I.Y.; funding acquisition, I.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Airlangga University through the Airlangga Research Fund 2024, under the Scheme of International Research Collaboration Top #100, according to The Rector of UNAIR’s Decree No. 672/UN3/2024, for the research project titled “Digital Forensics for Identification and Recognition of Ancient Collection”.

Data Availability Statement

The Lontar Terumbalan dataset in this study are available and can be freely accessed in the digital collection of the National Library of Indonesia via the page https://khastara.perpusnas.go.id/koleksi-digital/detail/?catId=1290335 (accessed on 17 May 2025). The AMADI LontarSet dataset is from the Challenge 1 of the International Conference on Frontiers in Handwriting Recognition (ICFHR) competition and can be accessed at http://amadi.univ-lr.fr/ICFHR2016_Contest/ (accessed on 21 July 2025).

Acknowledgments

The authors acknowledge the use of ChatGPT (version GPT-4o) as a tool to assist in dataset handling and result interpretation throughout the development of this study. All outputs generated through the platform were carefully reviewed, refined, and integrated by the authors, who assume full responsibility for the accuracy and integrity of the final content presented in this publication.

Conflicts of Interest

The authors declare that there are no financial, professional, or other conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IoU	Intersection over Union
SSIM	Structural Similarity Index Measure
MS-SSIM	Multiscale Structural Similarity Index Measure
FSIM	Feature Similarity Index Measure
ISODATA	Iterative Self-Organizing Data Analysis Technique
GMSD	Gradient Magnitude Similarity Deviation
PSNR	Peak Signal-to-Noise Ratio
RMSE	Root Mean Squared Error
MSE	Mean Squared Error
PLM	Palm Leaf Manuscript
ICFHR	The International Conference on Frontiers of Handwriting Recognition
DIA	Document Image Analysis

References

Wilson, E.B.; Rice, J.M. Palm Leaf Manuscripts in South Asia [Post-doc and Student Scholarship]. Syracuse University. 2019. Available online: https://surface.syr.edu/cgi/viewcontent.cgi?article=1013&context=ischoolstudents (accessed on 15 May 2025).
Khadijah, U.L.S.; Winoto, Y.; Shuhidan, S.M.; Anwar, R.K.; Lusiana, E. Community Participation in Preserving the History of Heritage Tourism Sites. J. Law Sustain. Dev. 2024, 12, e2504. [Google Scholar] [CrossRef]
Sudarsan, D.; Sankar, D. An Overview of Character Recognition from Palm Leaf Manuscripts. In Proceedings of the 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), Trichy, India, 30–31 March 2023; pp. 265–272. [Google Scholar]
Krithiga, R.; Varsini, S.R.; Joshua, R.G.; Om Kumar, C.U. Ancient Character Recognition: A Comprehensive Review. IEEE Access 2025, 13, 88847–88857. [Google Scholar] [CrossRef]
Dewi, D.A.S.; Arsa, D.M.S.; Putri, G.A.A.; Setiawati, N.L.P.L.S. Ensembling Deep Convolutional Neural Neworks For Balinese Handwritten Character Recognition. ASEAN Eng. J. 2023, 13, 133–139. [Google Scholar] [CrossRef]
Khafidlin, K. Ancient Manuscript Preservation of Museum Ranggawarsita Library Collection Semarang Central Java. Daluang J. Libr. Inf. Sci. 2021, 1, 52. [Google Scholar] [CrossRef]
Bannigidad, P.; Sajjan, S.P. Restoration of Ancient Kannada Handwritten Palm Leaf Manuscripts Using Image Enhancement Techniques. In Proceedings of the International Conference on Big Data Innovation for Sustainable Cognitive Computing, Coimbatore, India, 16–17 December 2022; Springer: Cham, Switzerland, 2023; pp. 101–109. [Google Scholar]
Maheswari, S.U.; Maheswari, P.U.; Aakaash, G.R.S. An intelligent character segmentation system coupled with deep learning based recognition for the digitization of ancient Tamil palm leaf manuscripts. Herit. Sci. 2024, 12, 342. [Google Scholar] [CrossRef]
Lian, X.; Yu, C.; Han, W.; Li, B.; Zhang, M.; Wang, Y.; Li, L. Revealing the Mechanism of Ink Flaking from Surfaces of Palm Leaves (Corypha umbraculifera). Langmuir 2024, 40, 6375–6383. [Google Scholar] [CrossRef]
Yuadi, I.; Halim, Y.A.; Asyhari, A.T.; Nisa’, K.; Nazikhah, N.U.; Nihaya, U. Image Enhancement and Thresholding for Ancient Inscriptions in Trowulan Museum’s Collection Mojokerto, Indonesia. In Proceedings of the 2024 7th International Conference of Computer and Informatics Engineering (IC2IE), Mojokerto, Indonesia, 12–13 September 2024; pp. 1–6. [Google Scholar]
Sudarsan, D.; Sankar, D. Enhancing Malayalam Palm Leaf Character Segmentation: An Improved Simplified Approach. SN Comput. Sci. 2024, 5, 577. [Google Scholar] [CrossRef]
Sudarsan, D.; Sankar, D. A Novel Complete Denoising Solution for Old Malayalam Palm Leaf Manuscripts. Pattern Recognit. Image Anal. 2022, 32, 187–204. [Google Scholar] [CrossRef]
Chamchong, R.; Jareanpon, C.; Fung, C.C. Generation of optimal binarisation output from ancient Thai manuscripts on palm leaves. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, China, 14–17 July 2013; pp. 1643–1648. [Google Scholar]
Trier, O.D.; Jain, A.K. Goal-directed evaluation of binarization methods. IEEE Trans Pattern Anal. Mach. Intell. 1995, 17, 1191–1201. [Google Scholar] [CrossRef]
Jyothi, R.L.; Rahiman, M.A. Document image binarization using difference of concatenated convolutions. J. Intell. Fuzzy Syst. 2021, 41, 2939–2952. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man. Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Sauvola, J.; Pietikäinen, M. Adaptive document image binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef]
Li, C.H.; Lee, C.K. Minimum cross entropy thresholding. Pattern Recognit. 1993, 26, 617–625. [Google Scholar] [CrossRef]
Antara Kesiman, M.W.; Burie, J.C.; Ogier, J.M.; Grangé, P. Knowledge Representation and Phonological Rules for the Automatic Transliteration of Balinese Script on Palm Leaf Manuscript. Comput. Sist. 2018, 21, 739–747. [Google Scholar] [CrossRef]
Niblack, W. An Introduction to Digital Image Processing; Birkeroed Strandberg Publishing Company: Copenhagen, Denmark, 1985. [Google Scholar]
Torres-Monsalve, A.F.; Velasco-Medina, J. Hardware implementation of ISODATA and Otsu thresholding algorithms. In Proceedings of the 2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA), Bucaramanga, Colombia, 31 August–2 September 2016; pp. 1–5. [Google Scholar]
Jailingeswari, I.; Gopinathan, S. Tamil handwritten palm leaf manuscript dataset (THPLMD). Data Brief 2024, 53, 110100. [Google Scholar] [CrossRef]
Sudarsan, D.; Sankar, D. A Novel approach for Denoising palm leaf manuscripts using Image Gradient approximations. In Proceedings of the 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 12–14 June 2019; pp. 506–511. [Google Scholar]
Fred, A.L.; Kumar, S.N.; Kumar, H.A.; Daniel, A.V.; Abisha, W. Evaluation of local thresholding techniques in Palm-leaf Manuscript images. Int. J. Comput. Sci. Eng. 2018, 6, 124–131. [Google Scholar] [CrossRef]
Jayanthi, N.; Indu, S. Application of Gaussian as Edge Detector for Image Enhancement of Ancient Manuscripts. IOP Conf. Ser. Mater. Sci. Eng. 2017, 225, 012149. [Google Scholar] [CrossRef]
Jayanthi, J.; Maheswari, P.U. Comparative Study: Enhancing Legibility of Ancient Indian Script Images from Diverse Stone Background Structures Using 34 Different Pre-Processing Methods. Herit. Sci. 2024, 12, 63. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Boyina, K.; Swetchana, D.; Reddy, G.M.; Sushrutha, K.; Pati, P.B.; Sivan, R. Layout Analysis of Malayalam Palm Leaf Images with Fully Convolutional Networks. In Proceedings of the 2023 IEEE Fifth International Conference on Advances in Electronics, Computers and Communications (ICAECC), Bengaluru, India, 7–8 September 2023; pp. 1–7. [Google Scholar]
Bipin Nair, B.J.; Rani, N.S. A modified deep semantic binarization network for degradation removal in palm leaf manuscripts. Multimed. Tools Appl. 2024, 83, 62937–62969. [Google Scholar]
He, S.; Schomaker, L. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognit. 2019, 91, 379–390. [Google Scholar] [CrossRef]
ICFHR2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts. 2016. Available online: http://amadi.univ-lr.fr/ICFHR2016_Contest/index.php (accessed on 21 July 2025).
Pratikakis, I.; Gatos, B.; Ntirogiannis, K. ICDAR 2013 Document Image Binarization Contest (DIBCO 2013). In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1471–1476. [Google Scholar]
Kesiman, M.W.A.; Burie, J.C.; Wibawantara, G.N.M.A.; Sunarya, I.M.G.; Ogier, J.M. AMADI_LontarSet: The First Handwritten Balinese Palm Leaf Manuscripts Dataset. In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 168–173. [Google Scholar]
Kesiman, M.W.A.; Maysanjaya, I.M.D. A model for posttransliteration suggestion for balinese palm leaf manuscript with text generation and lstm model. J. Phys. Conf. Ser. 2021, 1810, 012011. [Google Scholar] [CrossRef]
Burie, J.C.; Coustaty, M.; Hadi, S.; Kesiman, M.W.A.; Ogier, J.M.; Paulus, E.; Sok, K.; Sunarya, I.M.G.; Valy, D. ICFHR2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts. In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 596–601. [Google Scholar]
Siountri, K.; Anagnostopoulos, C.N. The Classification of Cultural Heritage Buildings in Athens Using Deep Learning Techniques. Heritage 2023, 6, 3673–3705. [Google Scholar] [CrossRef]
Xu, H.; Huang, Q.; Liao, H.; Nong, G.; Wei, W. MFFP-Net: Building Segmentation in Remote Sensing Images via Multi-Scale Feature Fusion and Foreground Perception Enhancement. Remote Sens. 2025, 17, 1875. [Google Scholar] [CrossRef]
Kaneko, H.; Ishibashi, R.; Meng, L. Deteriorated Characters Restoration for Early Japanese Books Using Enhanced CycleGAN. Heritage 2023, 6, 4345–4361. [Google Scholar] [CrossRef]
Lontar Terumbalan. National Library of Indonesia. Available online: https://khastara.perpusnas.go.id/koleksi-digital/detail/?catId=1290335 (accessed on 21 July 2025).
Davies, E.R. The Role of Thresholding. In Computer Vision; Academic Press: Cambridge, MA, USA, 2018; pp. 93–118. [Google Scholar]
Kesiman, M.W.A.; Valy, D.; Burie, J.C.; Paulus, E.; Suryani, M.; Hadi, S.; Verleysen, M.; Chhun, S.; Ogier, J.M. Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia. J. Imaging 2018, 4, 43. [Google Scholar] [CrossRef]
Khurshid, K.; Siddiqi, I.; Faure, C.; Vincent, N. Comparison of Niblack Inspired Binarization Methods for Ancient Documents. In Document Recognition and Retrieval XVI; Berkner, K., Likforman-Sulem, L., Eds.; SPIE: Bellingham, WA, USA, 2009; p. 72470U. [Google Scholar]
Farid, S.; Ahmed, F. Application of Niblack’s method on images. In Proceedings of the 2009 International Conference on Emerging Technologies, Islamabad, Pakistan, 19–20 October 2009; pp. 280–286. [Google Scholar]
Li, C.H.; Tam, P.K.S. An iterative algorithm for minimum cross entropy thresholding. Pattern Recognit. Lett. 1998, 19, 771–776. [Google Scholar] [CrossRef]
Abbas, A.; HCadenbach, A.; Salimi, E. A Kullback–Leibler View of Maximum Entropy and Maximum Log-Probability Methods. Entropy 2017, 19, 232. [Google Scholar] [CrossRef]
Feng, X.; Lv, J. Minimum Cross-Entropy Transform of Risk Analysis. In Proceedings of the 2011 International Conference on Management and Service Science, Wuhan, China, 12–14 August 2011; pp. 1–4. [Google Scholar]
Magid, A.; Rotman, S.R.; Weiss, A.M. Comments on Picture thresholding using an iterative selection method. IEEE Trans. Syst. Man. Cybern. 1990, 20, 1238–1239. [Google Scholar] [CrossRef]
Song, J.; Li, F.; Li, R. Improved K-means Algorithm Based on Threshold Value Radius. IOP Conf. Ser. Earth Environ. Sci. 2020, 428, 012001. [Google Scholar] [CrossRef]
Roy, P.; Dutta, S.; Dey, N.; Dey, G.; Chakraborty, S.; Ray, R. Adaptive thresholding: A comparative study. In Proceedings of the 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kanyakumari, India, 10–11 July 2014; pp. 1182–1186. [Google Scholar]
Yazid, H.; Arof, H. Gradient based adaptive thresholding. J. Vis. Commun. Image Represent. 2013, 24, 926–936. [Google Scholar] [CrossRef]
Rehman, N.A.; Haroon, F. Adaptive Gaussian and Double Thresholding for Contour Detection and Character Recognition of Two-Dimensional Area Using Computer Vision. Eng. Proc. 2023, 32, 23. [Google Scholar]
Noorfizir, A.; Rachmatullah, M.N.; Sulong, G. Hybrid Multilevel Thresholding-Otsu and Morphology Operation for Retinal Blood Vessel Segmentation. Eng. Lett. 2020, 28, 180–191. [Google Scholar]
Wang, Y.; Tian, S.; Wen, M.; Ruan, Y.; Tao, Q.; Zhou, X.; Gao, F.; Lu, H.; Zhang, Z. An end-to-end method for palm-leaf manuscript segmentation based on U-Net. J. Cult. Herit. 2023, 63, 169–178. [Google Scholar] [CrossRef]
Seidenthal, K.; Panjvani, K.; Chandnani, R.; Kochian, L.; Eramian, M. Iterative image segmentation of plant roots for high-throughput phenotyping. Sci. Rep. 2022, 12, 16563. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Lin, L.; Chen, H.; Kuruoglu, E.E.; Zhou, W. Robust structural similarity index measure for images with non-Gaussian distortions. Pattern Recognit. Lett. 2022, 163, 10–16. [Google Scholar] [CrossRef]
Guo, Y.; Wang, Y.; Meng, K.; Zhu, Z. Otsu Multi-Threshold Image Segmentation Based on Adaptive Double-Mutation Differential Evolution. Biomimetics 2023, 8, 418. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multi-scale structural similarity for image quality assessment. In Proceedings of the Conference Record of the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 9–12 November 2003; pp. 1398–1402. [Google Scholar]
Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index. IEEE Trans. Image Process. 2014, 23, 684–695. [Google Scholar] [CrossRef] [PubMed]
Nair, B.J.B.; Nair, A.S. Ancient Horoscopic Palm Leaf Binarization Using A Deep Binarization Model—RESNET. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 1524–1529. [Google Scholar]
Bipin Nair, B.J.; Govind, N. A Comparative Deep Learning Approaches to Binarize South Indian Manuscripts. In Proceedings of the 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon), Vijaypur, India, 20–21 November 2022; pp. 1–6. [Google Scholar]
Rani, N.S.; Akhilesh, T.M.; Bipin Nair, B.J.; Koushik, K.S.; Smith, E.B. PLM-Res-U-Net: A light weight binarization model for enhancement of multi-textured palm leaf manuscript images. Digit. Appl. Archaeol. Cult. Herit. 2024, 34, e00360. [Google Scholar] [CrossRef]

Figure 1. The sample of the original palm leaf manuscript images with some degraded parts and blurry texts. (a) Lontar Terumbalan [39]; (b) AMADI Lontarset [31,33].

Figure 2. Proposed method of classical and U-Net-based ResNet34 as encoder for Balinese palm leaf manuscript binarization.

Figure 3. Workflow of ground truth process for Lontar Terumbalan.

Figure 4. Illustration of the image preprocessing steps: (top) original lontar manuscript image, (middle) padded image with red grid lines indicating cropping boundaries, and (bottom) cropped image blocks sized 256 × 256 pixels.

Figure 5. Results of ten classical binarization methods of Lontar Terumbalan (left side) and AMADI Lontarset (right side). (a) Adaptive Gaussian; (b) Adaptive Mean; (c) Global Otsu; (d) ISODATA; (e) K-means; (f) Li’s minimum cross-entropy; (g) Niblack; (h) Otsu; (i) Otsu Morph; (j) Sauvola.

Figure 6. Results of deep learning binarization methods using U-Net model with a ResNet34 as encoder. (a) Lontar Terumbalan; and (b) AMADI Lontarset.

Figure 7. IoU and loss curve of U-Net model with a ResNet34 as encoder. (a) Lontar Terumbalan IoU curve; (b) AMADI Lontarset IoU curve; (c) Lontar Terumbalan loss curve; (d) AMADI Lontarset loss curve.

Figure 8. Line chart comparison of binarization performance across all methods using multiple evaluation metrics. (a) Lontar Terumbalan; (b) AMADI Lontarset.

Table 1. Dataset details, including total samples, cropped blocks, and the distribution into training, validation, and test sets.

Dataset	Number of Samples	Number of Cropped Blocks	Number of Train Samples	Number of Validation Samples	Number of Test Samples
Lontar_Terumbalan	19	608	486	61	61
AMADI_Lontarset	100	4120	3296	412	412

Table 2. Layer configuration and operations in U-Net ResNet34 architecture.

Layer	Output Size	Filters	Key Operations
Input	256 × 256 × 3	-	-
Conv0	128 × 128 × 64	64	Conv2D + Batch Normalization + ReLU
MaxPool	64 × 64 × 64	-	-
Encoder1	64 × 64 × 64	64	3× Residual Blocks
Encoder2	32 × 32 × 128	128	4× Residual Blocks
Encoder3	16 × 16 × 256	256	6× Residual Blocks
Encoder4	8 × 8 × 512	512	3× Residual Blocks
Decoder1	16 × 16 × 256	256	UpSample + [Encoder4, Encoder3]
Decoder2	32 × 32 × 128	128	UpSample + [Decoder1, Encoder2]
Decoder3	64 × 64 × 64	64	UpSample + [Decoder2, Encoder1]
Decoder4	128 × 128 × 32	32	UpSample + [Decoder3, Conv0]
Decoder5	256 × 256 × 16	16	UpSample
Output	256 × 256 × 1	1	Sigmoid

Table 3. Model configuration and training parameters of U-Net with ResNet34.

Parameter	Description
Model Type	U-Net with ResNet34 Encoder
Input Shape	256 × 256 × 3 (RGB)
Output Shape	256 × 256 × 1 (Binary Mask)
Total Parameters	24,456,154
Trainable Parameters	24,438,804
Batch Size	4
Epochs	25
Training Device	CPU Processor AMD Ryzen 7 3700X 8-Core
Output Activation	Sigmoid

Table 4. Summary of evaluation metrics’ binarization performance assessment.

Metric	Value Range	High Value Interpretation
IoU	0–1	Higher values indicate more accurate binarization
Dice	0–1	Higher values indicate greater overlap
Recall	0–1	Higher values indicate fewer false negatives
Accuracy	0–1	Higher values indicate better pixel classification
RMSE	0–∞ (typically < 1)	Lower is better—indicates fewer pixel-wise errors
PSNR	0–∞ (in dB)	Higher values indicate less noise/better quality
SSIM	−1 to 1 (typically 0–1)	Higher values indicate better structural similarity
FSIM	0–1	Higher values reflect better preservation of features
MS-SSIM	0–1	Higher values reflect better multiscale structural similarity
GMSD	0–∞ (typically < 1)	Lower is better—indicates higher gradient similarity

Table 5. The binarization results of Lontar Terumbalan based on confusion metrics.

Method		IoU	Dice	Recall	Accuracy
Classic	Adaptive Gaussian	0.575	0.730	0.839	0.604
	Adaptive Mean	0.581	0.735	0.835	0.616
	Global Otsu	0.900	0.947	0.944	0.933
	ISODATA	0.900	0.947	0.946	0.933
	K-means	0.900	0.947	0.944	0.933
	Li’s Minimum Cross Entropy	0.898	0.946	0.975	0.930
	Niblack	0.516	0.681	0.730	0.563
	Otsu	0.900	0.947	0.945	0.933
	Otsu Morph	0.897	0.945	0.949	0.930
	Sauvola	0.653	0.790	0.924	0.687
Deep Learning	U-Net Based Resnet34	0.972	0.986	0.983	0.983

Table 6. The binarization results of AMADI LontarSet based on confusion metrics.

Method		IoU	Dice	Recall	Accuracy
Classic	Adaptive Gaussian	0.862	0.925	0.968	0.867
	Adaptive Mean	0.853	0.920	0.943	0.861
	Global Otsu	0.749	0.802	0.792	0.770
	ISODATA	0.925	0.957	0.977	0.932
	K-means	0.750	0.801	0.794	0.771
	Li’s Minimum Cross Entropy	0.799	0.851	0.851	0.814
	Niblack	0.623	0.767	0.678	0.648
	Otsu	0.750	0.802	0.793	0.771
	Otsu Morph	0.747	0.800	0.791	0.768
	Sauvola	0.879	0.935	0.978	0.885
Deep Learning	U-Net Based Resnet34	0.971	0.985	0.986	0.975

Table 7. Evaluation value of structure-based similarity metrics for Lontar Terumbalan.

Method		SSIM	FSIM	MS-SSIM	GMSD
Classic	Adaptive Gaussian	0.180	0.479	0.380	0.457
	Adaptive Mean	0.210	0.502	0.419	0.460
	Global Otsu	0.745	0.777	0.726	0.336
	ISODATA	0.746	0.779	0.726	0.335
	K-means	0.745	0.778	0.726	0.336
	Li’s Minimum Cross Entropy	0.761	0.769	0.682	0.341
	Niblack	0.174	0.447	0.349	0.464
	Otsu	0.746	0.778	0.726	0.335
	Otsu Morph	0.767	0.776	0.708	0.343
	Sauvola	0.408	0.651	0.577	0.399
Deep Learning	U-Net Based Resnet34	0.938	0.941	0.998	0.195

Table 8. Evaluation value of structure-based similarity metrics for AMADI Lontarset.

Method		SSIM	FSIM	MS-SSIM	GMSD
Classic	Adaptive Gaussian	0.696	0.710	0.660	0.330
	Adaptive Mean	0.662	0.689	0.626	0.356
	Global Otsu	0.661	0.749	0.515	0.327
	ISODATA	0.823	0.770	0.605	0.308
	K-means	0.663	0.749	0.513	0.327
	Li’s Minimum Cross Entropy	0.692	0.697	0.470	0.349
	Niblack	0.143	0.340	0.229	0.419
	Otsu	0.662	0.748	0.513	0.328
	Otsu Morph	0.661	0.730	0.497	0.337
	Sauvola	0.765	0.761	0.705	0.299
Deep Learning	U-Net Based Resnet34	0.883	0.930	0.996	0.206

Table 9. Evaluation value of pixel-wise error-based metrics on all datasets.

Method		Lontar Terumbalan		AMADI Lontarset
Method		RMSE	PSNR	RMSE	PSNR
Classic	Adaptive Gaussian	0.629	4.042	0.356	9.169
	Adaptive Mean	0.619	4.171	0.367	8.849
	Global Otsu	0.258	11.848	0.396	9.741
	ISODATA	0.257	11.868	0.244	12.723
	K-means	0.258	11.850	0.394	9.783
	Li’s Minimum Cross Entropy	0.264	11.654	0.361	10.313
	Niblack	0.660	3.613	0.593	4.554
	Otsu	0.257	11.864	0.395	9.756
	Otsu Morph	0.263	11.653	0.400	9.581
	Sauvola	0.559	5.066	0.332	9.799
Deep Learning	U-Net Based Resnet34	0.143	17.059	0.161	16.400

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuadi, I.; Nisa’, K.; Nazikhah, N.U.; Halim, Y.A.; Asyhari, A.T.; Hu, C.-C. A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts. Heritage 2025, 8, 337. https://doi.org/10.3390/heritage8080337

AMA Style

Yuadi I, Nisa’ K, Nazikhah NU, Halim YA, Asyhari AT, Hu C-C. A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts. Heritage. 2025; 8(8):337. https://doi.org/10.3390/heritage8080337

Chicago/Turabian Style

Yuadi, Imam, Khoirun Nisa’, Nisak Ummi Nazikhah, Yunus Abdul Halim, A. Taufiq Asyhari, and Chih-Chien Hu. 2025. "A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts" Heritage 8, no. 8: 337. https://doi.org/10.3390/heritage8080337

APA Style

Yuadi, I., Nisa’, K., Nazikhah, N. U., Halim, Y. A., Asyhari, A. T., & Hu, C.-C. (2025). A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts. Heritage, 8(8), 337. https://doi.org/10.3390/heritage8080337

Article Menu

A Benchmark Study of Classical and U-Net ResNet34 Methods for Binarization of Balinese Palm Leaf Manuscripts

Abstract

1. Introduction

2. Materials and Methods

2.1. Binarization

2.1.1. Classical Binarization Techniques

2.1.2. Deep Learning Binarization Techniques

2.2. Quantitative Evaluation Metrics

2.2.1. Confusion Matrix-Based Metrics

2.2.2. Structure-Based Similarity Metrics

2.2.3. Pixel-Wise Error-Based Metrics

3. Results

3.1. Experiment 1: Classical Binarization

3.2. Experiment 2: Deep Learning-Based Binarization

3.3. Evaluation of Binarization Result

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI