An Approach to a Comprehensive Test Framework for Analysis and Evaluation of Text Line Segmentation Algorithms

The paper introduces a testing framework for the evaluation and validation of text line segmentation algorithms. Text line segmentation represents the key action for correct optical character recognition. Many of the tests for the evaluation of text line segmentation algorithms deal with text databases as reference templates. Because of the mismatch, the reliable testing framework is required. Hence, a new approach to a comprehensive experimental framework for the evaluation of text line segmentation algorithms is proposed. It consists of synthetic multi-like text samples and real handwritten text as well. Although the tests are mutually independent, the results are cross-linked. The proposed method can be used for different types of scripts and languages. Furthermore, two different procedures for the evaluation of algorithm efficiency based on the obtained error type classification are proposed. The first is based on the segmentation line error description, while the second one incorporates well-known signal detection theory. Each of them has different capabilities and convenience, but they can be used as supplements to make the evaluation process efficient. Overall the proposed procedure based on the segmentation line error description has some advantages, characterized by five measures that describe measurement procedures.


Introduction
Text line segmentation is a key step in off-line optical character recognition systems [1]. Any disturbances in this document image processing step will relate to inaccurately segmented text lines. Furthermore, it will result in optical character recognition failure [1].
Text documentation is mainly made up of printed text. It is characterized by well-formed text type which has strong regularity in shape and decent interword and line spacing [2]. Due to these facts text line segmentation of printed documents is a simpler task. Accordingly, techniques for detection of text lines in printed documents are largely successful [3]. On the contrary, text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting, and consequently processing of the handwritten documents has remained a leading challenge in document image processing till now [4].
According to many studies related to the evaluation of algorithms for text parameter extraction, testing is an unavoidable process. Until now, test methods were based mainly on testing algorithms using handwritten or printed text samples obtained from text databases. These testing methods were often accommodated to specific types of scripts and types of algorithms. In addition, the results obtained by different test types were difficult to compare, due to their relative inter-relationships [5].
A new approach to performance evaluation is based on comparing the detected segmentation results with an already annotated ground truth [6]. This approach is called the pixel-based method. Consequently, if the ground-truth line and the corresponding detected line share 90% of pixels this has been claimed as correctly detected lines [7]. However, this is an empirical guideline and cannot distinguish some specific circumstances.
Nevertheless, performance evaluation is a goal-oriented task. This is particularly true for text line segmentation. Few methodologies are established based on this attitude [8][9][10]. Hence, a similar methodology for the evaluation of algorithms for text segmentation is proposed. This paper introduces a testing framework for the evaluation of text segmentation algorithms. Some aspects of testing methodology are given in [9]. However, it is based on three synthetic like tests that emulate some of the characteristics of handwritten text. The paper added a handwritten text database as the extension to the previous three tests [10]. It consists of text elements that incorporate mixed text lines, touching components, etc. that represent the main challenges in text line segmentation. Furthermore, the proposed experimental framework consists of different types of customizable text patterns as well as handwritten text examples. Namely, each of the given experiments represents a separate entity. In addition, all of the tests can be linked by a bottom-up principle. The method is suitable for different types of letters and languages. Its adaptability is its main advantage.
Furthermore, the evaluation method in [9] relies completely on the RMSE methodology. It is extended by the incorporation of the methodology given in [11], which added a new measurement criterion, SLHR (Segmentation Line Hit Rate). In this paper, it is redesigned. It introduces a text segmentation error type classification based on five measures. Furthermore, it compares with a binary classification based on three measured experiments [10]. The proposed technique is tested on examples of the water flow algorithm and an algorithm based on the anisotropic Gaussian kernel. Furthermore, both algorithms are compared. Hence, the paper presents an efficient method for the evaluation of text segmentation algorithms.
The paper is organized as follows: in Section 2 the experimental framework for the text line segmentation is presented. Section 3 contains the test evaluation procedure, that involves classification of text objects and text segmentation errors as well as their division according to a binary classification. Section 4 offers a brief introduction to the principle of testing algorithms. Section 5 includes testing results and their evaluation by the proposed methods. Conclusions are given in Section 6.

Experimental Framework
The evaluation of any text line segmentation algorithm is related to its ability to properly perform text line segmentation. Text line segmentation is performed over different reference samples of text closely related to handwritten text elements, as well as the real ones. The experimental framework for the evaluation of the algorithm's text line segmentation consists of a few text experiments as follows [9]: • Handwritten text segmentation test [10].
The overall block diagram of the experimental framework is shown in Figure 1. The evaluation of the algorithm's ability to correctly segment text lines is the primary testing role. It is a prerequisite for obtaining other text parameters. Consequently, if the segmentation experiment fails, then further process examination will be meaningless. Hence, its importance is critical.
After the testing process, the obtained results are, in some way, cross-linked. Based on these results, the decision-making process will be achieved. The result of the decision-making procedure is a set of algorithm parameter values. This set is the starting point for the procedure of choosing the algorithm's optimal parameters.

Multi-Line Straight Text Segmentation Test
The multi-line straight text segmentation test is based on a straight text reference line. Straight text is defined by the skew angle β. Typical values of β that correspond to the handwritten text are those up to 20°. Hence, it takes values from the set {5°, 10°, 15°, 20°} [9]. Furthermore, between line spacing is set to a standard value to 20% of the standard character height [12]. This corresponds to single line spacing. Multi-line straight text samples are shown in Figure 2.

Multi-Line Waved Text Segmentation Test
The multi-line waved text segmentation test is based on a waved text reference line. Waved text is defined by the parameter ε, defined by the expression ε = h/l, where h is height, and l is half-width of the waved reference line (See Figure 3). Typical values of ε that correspond to the previously chosen values of skew angle β are from the set {1/12, 1/6, 1/4, 1/3} [9]. Between line spacing is set to 20% of the standard character height [12]. The resolution of the text samples is 150 and 300 dpi. Multi-line waved text samples are shown in Figure 3.

Multi-Line Fractured Text Segmentation Test
The multi-line fractured text segmentation test is based on a fractured text reference line. Fractured text is defined by the fractured skew angle φ. Typical values of φ that correspond to handwritten text are those up to 20°. Hence, it has values picked from the set {5°, 10°, 15°, 20°} [9]. Furthermore, between line spacing is set to 20% of the standard character height [12]. Resolution of the text samples is 150 and 300 dpi. Multi-line fractured text samples are shown in Figure 4.

Handwritten Text Segmentation Test
The multi-line handwritten text segmentation test is based on freestyle handwritten text samples in Serbian Latin, Cyrillic as well as in English scripts [10]. This is a small document text database. The total number of handwritten text samples is 220 text lines. These text samples contain variable skew lines, multi-oriented text as well as mutually inserted words from different text lines. For the sake of conformity, the documents body is the only one considered in the analysis of the text line segmentation. Resolution of the text samples is 150 and 300 dpi. A few handwritten text fragments from the text database are shown in Figure 5.

Test Results Evaluation
Testing of the algorithm represents the process of applying the algorithm to the proposed text samples. As an implication of the test, the new growing region around the text is raised. The major test assignment is the efficiency evaluation of the text line segmentation process algorithm.

Classification of the Text Objects
It is assumed that during text segmentation a reference sample text containing text objects, called connected-components, is processed by the algorithm. This process leads to a new text object configuration. In an ideal circumstance the number of newly arranged objects corresponds to the correct number of text lines. To make a valid algorithm evaluation the following text elements should be defined [10]: • Initial objects number O init , • Detected objects number O det , and • Reference objects number O ref .
Initial objects O init represents the starting number of objects in the reference sample text. It is calculated as the counted number of text objects in the starting sample text. After applying the algorithm to the sample text, the number of text objects is changed. Consequently, many text objects are mutually merged by the influence of the text segmentation algorithm. Currently, the number of text objects is given as the number of detected objects O det . The task of the text segmentation algorithm is to segment text lines hitting or missing this number of lines. Hence, this number of real text lines should be represented as the target number in reference sample text. It is called reference number of objects O ref . The algorithm efficiency is evaluated by comparing the reference and detected number of objects per each text line.

Classification of the Text Line Segmentation Errors
Text pixels belonging to the initial objects O init representing the same text line i form the reference object O ref for the line i. If the detected object O det for line i is integral and contains objects O init from the reference object O ref for the line i as subset, then the number of text objects in a distinct text line will be equal to one, which leads to a correctly segmented text line. The number of correctly detected text lines in the sample text is marked as O clindet . However, all others are defined as error. These circumstances are illustrated in Figure 6.    [7]).
Split lines errors represent the text lines which are wrongly divided by the algorithm into two or more components, i.e., text objects. This circumstance is known as over-segmentation. The joined lines error corresponds to the situation where the sequence of n consecutive lines is considered by the algorithm as a unique line. In that case, and if no other error happens, it is considered that one line in the sequence is correct and the other n-1 lines of the group are erroneous [7]. This phenomenon is called under-segmentation. Lines including outlier words correspond to lines containing words that are incorrectly assigned to two adjacent lines.

Evaluation of the Algorithm's Efficiency Based on Errors Type
The algorithm efficiency means the evaluation of the text line segmentation process made by investigated algorithm. If the number of detected objects is closer to the number of reference objects, then the algorithm is more efficient. To evaluate the algorithm's efficiency the following elements are introduced: • Segmentation line hit rate, i.e., SLHR, • Over-segmentation line hit rate, i.e., OSLHR, • Under-segmentation line hit rate, i.e., USLHR, • Mixed line hit rate, i.e., MLHR, and • Segmentation root mean square error (RMSE), i.e., RMSE seg .
SLHR represents the ratio of the number of correctly segmented text lines over the total number of text lines in the reference sample text. It is defined as: Over-segmentation phenomena lead to an increased number of objects per text line. Hence, the boundary growing area created by algorithm hasn't been successful in merging all objects of the text line into one. As previously stated, the number of the over-segmented lines is marked as O ovlindet . OSHLR represents the ratio of the number of over-segmented text lines over the total number of text lines in the reference sample text. It is defined as: The under-segmentation process leads to a smaller number of objects than the number of text lines. Hence, two or more consecutive text lines are considered as a unique one. USHLR represents the ratio of the number of under-segmented text lines over the total number of text lines in the reference sample text. It is defined as: The process of mutually injected objects from different text lines leads to mixed text lines. MLHR represents the ratio of the number of mixed text lines over the total number of text lines in the reference sample text. It is defined as: At the end, the number of detected and reference text objects (per each text line) is compared. Hence, the number of reference text objects per line is equal to 1. The variance evaluation is given by the RMSE [9]: where N is the total number of lines in the reference sample text, O i,ref is the number of reference objects in the text line i (equal to one per each line), and O i,est is is the number of detected objects in the text line i.

Evaluation of the Algorithm's Efficiency based on Binary Classification
Binary classification is based on the signal detection theory (SDT) postulate [13]. Its task is to classify the members of a given set of objects into two groups, based on whether they have some property or not. Suppose that we test the set of objects for the presence of a property. If some objects have a property and the test confirms it, then those objects are true positives (TP) [14]. In an unlikely scenario, some objects do not have a property, but the test confirms it. They are false negatives (FN) [14]. Some objects may have the property, but the test mistakenly does not confirm it. These are called false positives (FP) [14]. Finally, some objects do not have a property, and the test confirms it. These are true negatives (TN) [14]. In the context of classification tasks, the previous statements about the terms true positives, true negatives, false positives and false negatives are used to compare the given classification of an item. This is systemized in Table 1 in the so-called confusion matrix (CM) [14]. From these elements the common evaluation measures can be extracted [14]: Precision is a measure of the ability of a system to present only relevant items. It is defined as [14]: and it measures the exactness of a classification. A higher precision means less false positives, while a lower precision means more false positives. This is often at odds with recall, as an easy way to improve precision is to decrease recall.
Recall is a measure of the ability of a system to present all relevant items. It is defined as [14]: Recall measures the completeness, or sensitivity, of a classifier. Higher recall means less false negatives, while lower recall means more false negatives. Improving recall can often decrease precision because it gets increasingly harder to be precise as the sample space increases.
Precision and recall can be combined to produce a single metric known as f-measure, which is the weighted harmonic mean of precision and recall. It is defined as [14]: These elements can be used as common evaluation measures. The following measures are correlated in the text line segmentation [15,16]:

Principle of the Testing Algorithm
The smearing method sample for text line segmentation is used. It represents the group of boundary growing algorithms. In smearing methods the consecutive black pixels along the horizontal direction are smeared [17]. The seed points that fulfill predefined criteria activate the process. Consequently, the white space between black pixels is filled with black pixels. It is achieved only if their distance is within a predefined threshold. This way, enlarged areas of black pixels around text are formed. It is so-called boundary growing areas. These areas of the smeared image enclose separated text lines. Hence, obtained areas are mandatory for text line segmentation. In the following text, two testing algorithms will be introduced: • water flow algorithm, and • algorithm based on anisotropic Gaussian kernel.

Water Flow Algorithm
The water flow algorithm proposed in [18] is also used. It will be just briefly explained. The algorithm assumes a hypothetical flow of water in a particular direction across an image frame in such a way that it faces obstruction from the characters of the text lines. As a result of water flow algorithm, unwetted image frames are extracted. These areas represent the triangle shadows that form the so-called unwetted regions. Seed points that activate the algorithm represent the isolated corner points of the text objects. Further, this hypothetical water flow is expected to fill up the gaps between consecutive text lines. Hence, unwetted areas are of major importance for text line segmentation. The circumstance where hypothetical water flows from left to right is shown in Figure 7. Furthermore, the parameter water flow angle α is introduced. It widely affects the unwetted regions shape influencing the text line segmentation process. Hence, the selecting process of the water flow angle value is crucial to the quality of the text line segmentation. The complete process of the water flow algorithm applied on the text sample formed of the three letters I is shown in Figure 8. Gray regions represent the unwetted areas incorporating initial text objects. The stripes of unwetted areas are labeled for the extraction of text lines. Once the labeling is completed, the image is divided into two different types of stripes. First one contains text lines, while the other one contains line spacing. It is shown in Figure 9.

Algorithm Based on Anisotropic Gaussian Kernel
An algorithm based on the anisotropic Gaussian kernel is also used for testing. It will be explained briefly. Its main principle is expanding black pixel areas of text by scattering every black pixel in its neighborhood. This way, distinct areas that mutually separate text lines are established. Hence, the primary purpose is joining only text elements from the same text line into the same distinct continuous areas. The Gaussian probability function is taken as a template that gives the probability of the random function. Consequently, it represents the probability of the hypothetical expansion around every black pixel representing a text element. Furthermore, around every black pixel, new pixels are non-uniformly dispersed.  These new pixels have lower black intensity. Because the level of probability expansion relates to distance from black pixel, their intensity depends completely on the distance from the original black pixel. However, after applying the Gaussian anisotropic kernel, equal to 2K + 1 in the x-direction and 2L + 1 in the y-direction, text is scattered forming an enlarged area around it. Newly created pixels are grayscale. Hence, document text image is a grayscale. Now, inside the kernel a "probability" sub-area is formed using the radius 3σ x and 3σ y of ellipse in x and y direction. σ represents standard deviation defining curve spread parameter. Converting all these pixels into black pixels as well as inverting image, forms the new black pixel expanded areas [7]. These areas are named boundary-growing areas. The algorithm's application to the text sample is given in Figure 10.

Water Flow Algorithm
For the purpose of testing the algorithm, the parameter water flow angle α from the reduced set {10°, 12°, 14°} is used [19,20]. Text samples are converted to 300 dpi resolution. Testing of the algorithm is performed on the example of 96 lines of multi-line straight, waved, and fractured text as well as 220 lines of diverse handwritten text, consisting of a variety of different scripts (over 500 lines of text).

Test Results
The results after applying the algorithm to the four proposed reference text sample groups are presented in Tables 2-5.

. Evaluation Based on Error Type
The first evaluation process is based on the text line segmentation error type. The results (from Tables 2-5) are rearranged in the appropriate form validated by measures: SLHR, OSLHR, USLHR, MLHR, and RMSE. These results are given in Tables 6-9.  In the multi-line waved text segmentation test the phenomena of under-segmentation appeared. It is raised by decreasing the water flow angle α. However, the segmentation line hit rate is improved by reducing α. The small value of RMSE confirms the advantage of choosing a water flow angle equal to 10°. In the multi-line fractured text segmentation test decreasing the water flow angle α leads to mixed results. Although the segmentation results are slightly better, it shows an increased number of mistakenly recognized lines identified as under-segmented ones. Hence, there is no difference between choosing 10° or 12° for the water flow angle. The similar RMSE values reaffirm this. In the multi-line handwritten text segmentation test use of small water flow angle below 12° noticeably improves the quality of the segmentation process. The RMSE value identified this as well.

Evaluation Based on Binary Classification
The evaluation process is based on the binary classification. The results (from Tables 2-5  In the multi-line straight text segmentation test, due to the lack of under-segmentation, precision is the only relevant measurement element. Hence, the water flow angle election of 10° gives the best results. F-measure matched this confirmation. In multi-line waved text segmentation test, decreasing the water flow angle leads to higher precision. However, the occurrence of under-segmentation leads to lower recall values. F-measure as a combination of precision and recall illustrates this. Hence, there is no significant advantage between the election of 10° or 12° for the water flow angle. In the multi-line fractured text segmentation test the results described by precision and recall are similar for the water flow angle of 10° and 12°. The values of f-measure just confirm it. In the multi-line handwritten text segmentation test the advantage of decreasing the water flow angle is important. Consequently, the precision is highly improved. Because under-segmentation elements are missing, the precision is the only relevant measure. F-measure just follows it.

Algorithm Based on Anisotropic Gaussian Kernel
For the purpose of testing the algorithm based on anisotropic Gaussian kernel, its parameters K and L are under consideration. The main purpose of testing is the optimization of these parameters. Because of the size of the letters, K is picked from the reduced set {5, 8, 10} [12,21]. Furthermore, corresponding the parameter λ is used instead of L. It is defined as λ = K/L. λ is selected from the reduced set {3, 4, 5} [21,22]. All text samples are converted to 300 dpi resolution. Testing of the algorithm is performed on the example of 96 lines of multi-line straight, waved, and fractured text as well as 220 lines of diverse handwritten text, which consist of different variety of scripts (over 500 lines of text).

Evaluation Based on Binary Classification
The evaluation process is based on the binary classification. The results (from Tables 14-17)   In the multi-line straight text segmentation test, due to under-segmentation, recall is meaningful. Hence, enlarging K and λ which leads to the under-segmentation, and lower recall as well as f-measure follows.  In the multi-line waved text segmentation test, good values of precision and recall are connected with higher K and λ pairs.  Like to previous test, in the multi-line fractured text segmentation test enlarging the K and λ pair follows better precision and recall values. In the multi-line handwritten text segmentation test the advantage of increasing K and λ pair is obvious. However, further enlargement of this pair will not afford any improvement of precision and recall.

Comparative Analysis and Interpretation of the Evaluation Process
The evaluation based on error type contains five distinct measures: SLHR, OSLHR, USLHR, MLHR, and RMSE. Their interpretation is clear and unmistakable. The fifth measure is RMSE, which is clearly distinct in fine tuning segmentation results (See Example #1, and 2 in the Appendix). Obviously, the evaluation based on error type is more clear and remarkable. In contrast, the evaluation based on the binary classification has only three distinct measures: precision, recall, and f-measure. Consequently, the third one is the harmonic mean of the other two. Nevertheless, this evaluation process includes more statistical measures. In [10] evaluation based on binary classification is improved by additional measurement extension. However, both methods have different capabilities and convenience, and they can be used mutually as well. Still, the method with five measures has certain advantages. Hence, it is chosen in the decision-making procedure.

Decision-Making Procedure
From the obtained results, the decision-making procedure is performed. It results as the set of algorithm parameter values, which are the starting point for choosing the algorithm's optimal parameters. Hence, each test, according to the obtained results, gives the optimal subset of parameter values. These values offer the best response of the algorithm to the specific text samples. Each test experiment is referring as i. Furthermore, it means that for the test framework i = 1, ..., N, where N represents the total number of tests. In our case N = 4. For each test i, the best parameters subset is given as P i . Finally, the final set of parameters is given as P f :

Water Flow Algorithm
For the water flow algorithm comparative results linked with the five measures for different tests are joined in the integral tables e.g., for SLHR, OSHLR, USLHR, MSLHR and RMSE. From Tables 6-9  the following Tables 26-29   Results from Tables 26-29 are the key for the decision-making procedure. Consequently, they represent the real picture of the algorithm's evaluation for text line segmentation. However, Table 30 is linked with the comparative results of SLHR in favor of the algorithm parameter α.
It is clear that from the test values of parameter α, the best response of the algorithm to the various types of text is obtained for the parameter α = 10° [20]. In addition, the evaluation of RMSE confirms it as well. However, careful examination of the USLHR should be taken into consideration for further fine-tuning of the parameter α.

Algorithm Based on the Anisotropic Gaussian Kernel
For the algorithm based on the anisotropic Gaussian kernel integral comparative results (see Tables 18-21) concerning SLHR, OSHLR, USLHR, MSLHR and RMSE are shown in Tables 31-34. Results from Tables 31-34 are the basis for the procedure of choosing the optimal algorithm parameters. Furthermore, Table 35 is linked with the comparative results of SLHR in favor of the algorithm parameters (K and λ).

Comparison between Algorithms
The final word in testing efficiency is represented by the comparison of the obtained results between the optimal parameter values of both algorithms. For the water flow algorithm (WF algorithm) the optimal parameter α is equal to 10° [20]. Furthermore, for the algorithm based on anisotropic Gaussian kernel (AGK algorithm) the optimal parameter pair is given by the (10,4). Comparative analysis based on error type classification is given in Tables 36-39.  From Table 37, the AGK algorithm has no problem with over-segmentation phenomena. On the contrary, the WF algorithm has to be improved. However, these circumstances can be overcome by additional morphological post-processing. In addition, in a real situation such as with handwritten text both algorithms are equal. From Table 38, it is obvious that the AGK algorithm has clear problems with under-segmentation. This is a key which leads to better results of the WF algorithm in a complex and diverse test such as the handwritten text. The RMSE measure of the WF and AGK algorithms just confirms the previous statements, i.e., the slight advantage of the WF over the AGK algorithm. Figure 11 shows the SLHR (%) comparison between the WF and AGK algorithms.  Figure 11, the WF algorithm can process the various type of text by the SLHR margin of over 65%, while the AGK algorithm cannot. Hence, the WF algorithm has a clear advantage over the AGK algorithm. Similar evaluations can be used for the comparison of algorithms by the methodology based on binary classification. However, it has only three measures and some circumstances are not clearly distinct [10] (See Appendix). Furthermore, comparative analysis based on binary classification of errors is given in Tables 40-42.  F-measure is criteria that reflect all bad and good results of testing. Hence, the evaluation process of the algorithm should be very sensitive to this measure [10]. From Table 42, WF algorithm has been characterized by more uniform level of f-measure value. Figure 12 shows f-measure comparison between WF and AGK algorithms.  Figure 12, the WF algorithm can process the various type of text by the f-measure margin of around and over 80%, while the AGK algorithm can do so only up to 75%. Again, the WF algorithm has a clear advantage over the AGK algorithm. However, the interpretation process of the binary classification of errors is not so obvious as the error type classification.

Conclusions
The paper proposes a comprehensive test framework for the evaluation of the algorithms' effectiveness in the process of text line segmentation. Previously, all testing procedures were custom oriented based on document image databases representing templates. However, the proposed test framework presents a step towards testing generalization in the domain of document image processing algorithms. It consists of four various multi-line text experiments: straight, waved, fractured, and handwritten ones. Further, two suitable validation methods are provided. The first method is based on the text line segmentation error terms. It incorporates five distinct measures. They are inter-related as well. The other one, which is well known and more often used, is based on the binary classification linked with signal detection theory. It consists of three distinct and inter-related measures. Both methods have different capabilities and convenience, but can be used concurrently and supplemented as needed. However, due to the five measures that characterize the measurement process, the method of algorithm evaluation based on error type has certain advantages. In addition, this evaluation process is useful for algorithm assessment as well as for making any conclusions about it. In the end, the adaptability of the comprehensive test framework for different types of letters and languages represents its main advantage.

Evaluation of the Algorithm's Efficiency Based on Error Type (Example #1)
All test results from algorithm #1 and #2 are reorganized according to segmentation error type. The results are presented in Table A1.

Evaluation Based on Binary Classification (Example #1)
All test results from algorithm #1 and #2 are reorganized according to segmentation binary classification. The results are presented in Table A2. According to RMSE, the algorithm #2 shows slightly better performances than algorithm #1 in the domain of text line segmentation.

Example #2
After the process of text line segmentation by the algorithms #1 and #2, obtained results are shown in Figure A2.

Evaluation of the Algorithm's Efficiency Based on Error Type (Example #2)
All test results from algorithm #1 and #2 are reorganized according to segmentation errors type. The results are presented in Table A3. Accordingly, MLHR represents the most penalized error due to the difficult process of identification and correction.

Evaluation Based on Binary Classification (Example #2)
All test results from algorithm #1 and #2 are reorganized according to segmentation binary classification. The results are presented in Table A4.