Characterization of Prints Based on Microscale Image Analysis of Dot Patterns

Featured Application: This project was developed in the frame of the project PACKMARK, in collaboration with Sergusa Solutions Pvt. Ltd. which provided all printed samples necessary for this study. The aim of this project is to identify fake pharmaceuticals packaging products printed using a CMYK (Cyan Magenta Yellow Black) gravure printing machine from color information and microscale analysis. Most often fake packaging printed on blister foils using rotogravure process results from the use of counterfeited cylinder and rotogravure press. Abstract: Identifying a print document (original) from a reprint document (copy or fake) can be a challenge. The analyse at microscopic scale of print documents shows random dot shapes which depend on the printing parameters as well as the printing device used. We can, therefore, draw the assumption that the dot shapes can be used as a ﬁngerprint to differentiate a print from a reprint. In this paper, we explore several shape indexes that were not investigated until now to analyse at microscopic scale documents printed on aluminium foils using rotogravure printing process. This paper presents a statistical analysis which is based on a pattern recognition process deﬁned by three steps. First, a new image processing pipeline is used to segment automatically disconnected dots. Next, new dot pattern features are used to characterize automatically dot patterns. Six types of dot patterns (including four types of doughnut patterns) are introduced. Lastly, a new statistical analysis method is used to characterize a printed sample from the set of dots printed on it. The experiments done demonstrate the relevance of the analytical method proposed. Results shows the potential of this method to identify a reprint from a print.


Introduction
The objective of this paper is to explore statistically geometrical shape indexes that were not investigated until now to analyse at microscopic scale documents printed on aluminium foils using rotogravure printing process.
The rotogravure printing process is not a very common printing process except for printing in large quantities, as in the printing packaging industry, especially when it comes to printing on complex surfaces such as thin aluminium foils used in pharmaceutical packaging. One of the main property of foils is that they are non-porous substrates, consequently they do not cause optical dot gain. Optical dot gain happens with porous paper, due to a lateral propagation of light printed dots appear larger than their physical size [1]. On foils, due to the high reflectance of the surface, any surface irregularity (for example due to material handling) may have an impact on surface reflectance even under well-controlled illumination conditions. In our study case this effect was minor as the printed samples measured were kept flat. On the other hand, some surface irregularities or topographical depressions may be at the origin of randomly occurring small holes in printed surfaces. Similar observation was done by Hamblyn in [2] who reported a very good correlation between flexo-printed uncovered areas in dots and the location of topographical depressions in paper and paperboard surfaces.
The objective of this paper is also to demonstrate that a printed document can be characterized by a set of statistical features, computed from size, shape and distribution of printed dots, which can be considered as a fingerprint of the print process. To reach this objective we propose a pattern recognition based process which consists of: (1) an new image processing pipeline defined to segment automatically, for each primary color, each individual dot (see Section 3.1, Section 3.2, Section 3.3, Section 3.4, Section 3.5); (2) new pattern features to automatically characterize dots at microscopic scale (see Section 3.6) and to classify dots in six types of dot patterns (including four types of doughnut patterns); (3) a new statistical analysis rules-based method to characterize a printed sample from the set of dots printed on it (see Section 3.7).
There is a large variety of segmentation methods in the literature. Each method has some advantages and also few drawbacks. Some perform better than others, depending on the characteristics of the images to process and of the purpose of the segmentation. In our study case, neither edge-based methods nor region-based methods are adapted to our images, as the contrast between printed dots and the background is too low meanwhile the signal-to-noise ratio is low. On the other hand, threshold methods are well adapted to convert grey-scale images corresponding to primary color images to binary images. In this study, we aim to discriminate printed dots from the print substrate, so a binarization method was used. A similar strategy was implemented by Nguyen et al. in [3]. We used the Otsu global thresholding method [4] for two main raisons: (1) it is based on an unsupervised histogram thresholding approach minimizing intra-class intensity variance; (2) the performance of this method is among the best to differentiate ink coverage from paper regions [4]. However, this method suffers from a drawback, when background values are not homogeneous it performs lower than local threshold methods. In the context of our study case, the Otsu method works well, but a postprocessing based on mathematical morphological tools is nevertheless necessary to increase the robustness of this method. This consists of filtering noise and outliers, such a fuzzy pixels on the dot edges, to locally correct the shape of dots and to perform more accurate measurements (to reduce the impact of physical dot gain on dots shape and dots size). A similar postprocessing approach is reported in [5].
One of the most important phenomena that affects dot segmentation and, consequently, the dot structure analysis is the physical dot gain, meaning that printed dots appear larger than the dots in the digital bitmap [1]. This phenomenon is due to the ink spreading on the surface, which results in an enhancement of the physical dot size.
There is a huge variety of shape indexes in the literature, but few are used to characterize dot patterns, such as dot area (in µm 2 ), convex area (in µm 2 ), perimeter (in µm), diameter (in µm), angle (in degree), circularity, roundness, aspect ratio, convexity, etc. [3,5,6]. Several are well designed to characterize simple geometric shapes, such a circles or ellipses. On the other hand, they are not well adapted to characterize complex geometric shapes, such as doughnut shapes. In this study, we investigate several other pattern features to discriminate doughnut shapes from simple geometric shapes and to classify printed dots in six types of dot patterns. A few other features can also be used to characterize dot patterns, such as probabilistic modeling [7]; however, probabilistic features are more adapted to describe non-spherical dots with some degree of surface roughness than dot patterns with doughnuts. Likewise, the probabilistic model investigated by Vallat in [5] is not adapted for doughnut dots or non-spherical patterns. The paper [2] was one of the first papers which investigated dot patterns with concave geometry and local deformation. Doughnuts correspond to ring-shape printed halftone dots with a lack of ink in their centre [2]. This effect is caused by the ink being displaced outward and away from the top of the printing feature during substrate contact. This effect is also due to the rotogravure printing process, Appl. Sci. 2021, 11, 6634 3 of 23 which is a pressure-sensitive process. According to Hamblyn et al. [2] doughnuts are more prominent at high impression pressure-that is, the origin of the doughnut printing defect is still uncertain and the influence of the dot top geometry on the defect [2] is still unknown. The problem of doughnuts on the boundary of dots was reported in [8] but, to the best of our knowledge, was never being investigated in the context of dot pattern characterization. In [8], a study performed with two cylinders engraved using chemical etching and two rotogravure presses, we also observed many doughnuts. According to Mathes [9] the most likely cause of doughnuts is cylinder or plate swelling. Aggressive inks and solvents can also be at the origin of swelling and distortion. Ink with too much draw and ink viscosity too low can be also at the origin of doughnuts. Several studies have shown that several printing parameters can have a significant impact on the quality of a print, for example, on printed dot size or on doughnut effect. For example, Sosa reported in [10] that the temperature from the surroundings and/or from the operating equipment has an impact on the viscosity of ink and consequently on the spreading of the ink on the substrate. The viscosity of inks changes about 3 to 4% per degree Fahrenheit change. Sosa showed in [10] that, when the ink temperature increases, dots are deformed, doughnut effect increases and dot perimeter increases. Kader reported in [11] that, when the ink temperature increases, the ink filling the gravure cylinder cells takes a concave shape, leaving an air bubble trapped between the ink and the substrate. Kader also reported in [11] that the lightness of the dots can be explained by the ample quantity of solvent in the ink, reducing the solid ink constituent. The dot structure analysis performed by Kader is limited to a visual analysis; no quantitative evaluation of doughnuts was performed.
Tkachenko et al. [8] and Joshi et al. [12] demonstrated the potential of local deformations of printed patterns at macroscopic scale, such as "e" letter, to identify unique print signature (see example in Figure 1). On the other hand, Oliver et al. [13] demonstrated the potential at the microscopic scale of roundness and the perimeter of printed dots in discriminating one printer model from another. From the experimental results reported in this paper (see Section 4), we can hypothesis that the deformations at the microscopic scale of the printed dots induced by the printing process and modeled by our shape indexes can be used as a fingerprint to authenticate a print document (see Section 4.1). al. [2] doughnuts are more prominent at high impression pressure-that is, the origin of the doughnut printing defect is still uncertain and the influence of the dot top geometry on the defect [2] is still unknown. The problem of doughnuts on the boundary of dots was reported in [8] but, to the best of our knowledge, was never being investigated in the context of dot pattern characterization. In [8], a study performed with two cylinders engraved using chemical etching and two rotogravure presses, we also observed many doughnuts. According to Mathes [9] the most likely cause of doughnuts is cylinder or plate swelling. Aggressive inks and solvents can also be at the origin of swelling and distortion. Ink with too much draw and ink viscosity too low can be also at the origin of doughnuts. Several studies have shown that several printing parameters can have a significant impact on the quality of a print, for example, on printed dot size or on doughnut effect. For example, Sosa reported in [10] that the temperature from the surroundings and/or from the operating equipment has an impact on the viscosity of ink and consequently on the spreading of the ink on the substrate. The viscosity of inks changes about 3 to 4% per degree Fahrenheit change. Sosa showed in [10] that, when the ink temperature increases, dots are deformed, doughnut effect increases and dot perimeter increases. Kader reported in [11] that, when the ink temperature increases, the ink filling the gravure cylinder cells takes a concave shape, leaving an air bubble trapped between the ink and the substrate. Kader also reported in [11] that the lightness of the dots can be explained by the ample quantity of solvent in the ink, reducing the solid ink constituent. The dot structure analysis performed by Kader is limited to a visual analysis; no quantitative evaluation of doughnuts was performed. Tkachenko et al. [8] and Joshi et al. [12] demonstrated the potential of local deformations of printed patterns at macroscopic scale, such as "e" letter, to identify unique print signature (see example in Figure 1). On the other hand, Oliver et al. [13] demonstrated the potential at the microscopic scale of roundness and the perimeter of printed dots in discriminating one printer model from another. From the experimental results reported in this paper (see Section 4), we can hypothesis that the deformations at the microscopic scale of the printed dots induced by the printing process and modeled by our shape indexes can be used as a fingerprint to authenticate a print document (see Section 4.1).
(a) 1st sample ( b) 2nd sample (c) 3rd sample (d) 4th sample Figure 1. Examples of sample letters "e" printed (or reprinted) with: (a) an engraved cylinder and a rotogravure press, (b) same rotogravure press but using another engraved cylinder, (c) same engraved cylinder as in (a) but using another rotogravure press, (d) same engraved cylinder as in (b) and same rotogravure press as in (c).
The main technical information related to this study is summarized in Section 2. Then, the different methods proposed are described in detail in Section 3. Next, in Section 4 we present experimental results and in Section 5 we discuss the robustness and the efficiency of these methods. Lastly, in Section 6, we discuss the outcomes of the proposed work and draw some conclusions.

Technical Information
According to Nguyen et al. [3], at the microscopic scale, a printed dot is a random pattern, the shape of which depends on the technology, the setting of the printer, the ink quality and/or the paper properties. From this statement, Nguyen et al. claimed in [3] that the shapes of printed dots can be considered, at microscopic scale, as the intrinsic feature of the printing process. Cells engraved on the gravure cylinder can have an elongated Examples of sample letters "e" printed (or reprinted) with: (a) an engraved cylinder and a rotogravure press, (b) same rotogravure press but using another engraved cylinder, (c) same engraved cylinder as in (a) but using another rotogravure press, (d) same engraved cylinder as in (b) and same rotogravure press as in (c).
The main technical information related to this study is summarized in Section 2. Then, the different methods proposed are described in detail in Section 3. Next, in Section 4 we present experimental results and in Section 5 we discuss the robustness and the efficiency of these methods. Lastly, in Section 6, we discuss the outcomes of the proposed work and draw some conclusions.

Technical Information
According to Nguyen et al. [3], at the microscopic scale, a printed dot is a random pattern, the shape of which depends on the technology, the setting of the printer, the ink quality and/or the paper properties. From this statement, Nguyen et al. claimed in [3] that the shapes of printed dots can be considered, at microscopic scale, as the intrinsic feature of the printing process. Cells engraved on the gravure cylinder can have an elongated shape Appl. Sci. 2021, 11, 6634 4 of 23 (cell angle of 60 • ), a normal shape (cell angle of 45 • ) or a compressed shape (cell angle of 30 • ). Cell angle has an impact on ink transfer on the printed substrate. In our case study, the cell angle was of 30 • (using a 150 LPI screen ruling and a 130 • engraving needle angle) and the ink coverage was of 20%. We will see in this paper that, with a theoretical ink coverage of 20%, due to the physical dot gain, we have already, in some cases, difficulties separating some of the printed dots that are spatially connected. Printed samples were captured by a camera (resolution 8.889 × 6.667 inches) mounted on a microscope (magnification 4×).
Three gravure printing presses with foil inks were used to print the samples used in this study on a blister foil substrate. The gravure cylinders used were engraved by electromechanical engraving process with a dot coverage of 20%. The engraving process was done at 150 LPI screen ruling and a 130 • stylus angle. Solvent-based magenta, cyan and yellow gravure ink, specially recommended for blister foils, were used to print the color dots at 60 m/min speed. The thicknesses of the foils were of 25 µm. The printing speed, printing pressure and doctor blade pressure were kept constant during the printing of the samples. The drying temperature was set at 50 • C to 60 • C for the printing. The ambient temperature and humidity were 17 ± 3 • C and 35 ± 5% inside the press, respectively. Several color print and reprint samples were collected for analysis (either monochromatic samples (e.g., C, M or Y); bi-chromatic (e.g., CM); tri-chromatic (CMY) or quadri-chromatic (CMYK)). These color samples are related to selected samples of the IT8.7/3 color chart, which is commonly used to characterize a CMYK printer (see example in Figure 2). shape (cell angle of 60°), a normal shape (cell angle of 45°) or a compressed shape (cell angle of 30°). Cell angle has an impact on ink transfer on the printed substrate. In our case study, the cell angle was of 30° (using a 150 LPI screen ruling and a 130° engraving needle angle) and the ink coverage was of 20%. We will see in this paper that, with a theoretical ink coverage of 20%, due to the physical dot gain, we have already, in some cases, difficulties separating some of the printed dots that are spatially connected. Printed samples were captured by a camera (resolution 8.889 × 6.667 inches) mounted on a microscope (magnification 4×). Three gravure printing presses with foil inks were used to print the samples used in this study on a blister foil substrate. The gravure cylinders used were engraved by electromechanical engraving process with a dot coverage of 20%. The engraving process was done at 150 LPI screen ruling and a 130° stylus angle. Solvent-based magenta, cyan and yellow gravure ink, specially recommended for blister foils, were used to print the color dots at 60 m/min speed. The thicknesses of the foils were of 25 μm. The printing speed, printing pressure and doctor blade pressure were kept constant during the printing of the samples. The drying temperature was set at 50 °C to 60 °C for the printing. The ambient temperature and humidity were 17 ± 3 °C and 35 ± 5% inside the press, respectively. Several color print and reprint samples were collected for analysis (either monochromatic samples (e.g., C, M or Y); bi-chromatic (e.g., CM); tri-chromatic (CMY) or quadri-chromatic (CMYK)). These color samples are related to selected samples of the IT8.7/3 color chart, which is commonly used to characterize a CMYK printer (see example in Figure 2).

Measurement, Modelling and Analysis at the Microscale of Printed Dots
The image processing pipeline that we propose is illustrated in Figure 3 and detailed in the following sections.

Measurement, Modelling and Analysis at the Microscale of Printed Dots
The image processing pipeline that we propose is illustrated in Figure 3 and detailed in the following sections.

Separation of Ink Colors (Step 1)
The separation of ink colors is not an easy task as there is often some mis-registration between the printed color channels. Figure 4 shows examples of printed cyan, magenta and yellow dots shifted in position. Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 23

Separation of Ink Colors (Step 1)
The separation of ink colors is not an easy task as there is often some mis-registration between the printed color channels. Figure 4 shows examples of printed cyan, magenta and yellow dots shifted in position.
The problem of halftone misregistration was studied in [1]. Figure 4a shows an example of a halftoning scheme with different screen angles for cyan, magenta, yellow and black channels. According to Namedanian [1], black, the color with the strongest contrast, is halftoned and placed at 45°, due to the lower sensitivity of eye at this angle; meanwhile, yellow, the weakest color, is halftoned at 0°, where the human eye is most sensitive. According to Namedanian [1] using a different angle for each channel reduces the effect of misregistration but, on the other hand, introduces a type of pattern, named a rosette pattern, which is quite visible at microscale.  The problem of halftone misregistration was studied in [1]. Figure 5a shows an example of the halftoning scheme with different screen angles for cyan, magenta, yellow and black channels. Black, the color with the strongest contrast, is halftoned and placed at 45° due to the lower sensitivity of eye at this angle, meanwhile yellow, the weakest color, is halftoned at 0°, where the human eye is most sensitive [1]. Using a different angle for each channel reduces the effect of misregistration, on the other hand it introduces rosette patterns, which are quite visible at microscale [1]. The problem of halftone misregistration was studied in [1]. Figure 4a shows an example of a halftoning scheme with different screen angles for cyan, magenta, yellow and black channels. According to Namedanian [1], black, the color with the strongest contrast, is halftoned and placed at 45 • , due to the lower sensitivity of eye at this angle; meanwhile, yellow, the weakest color, is halftoned at 0 • , where the human eye is most sensitive. According to Namedanian [1] using a different angle for each channel reduces the effect of misregistration but, on the other hand, introduces a type of pattern, named a rosette pattern, which is quite visible at microscale.
The problem of halftone misregistration was studied in [1]. Figure 5a shows an example of the halftoning scheme with different screen angles for cyan, magenta, yellow and black channels. Black, the color with the strongest contrast, is halftoned and placed at 45 • due to the lower sensitivity of eye at this angle, meanwhile yellow, the weakest color, is halftoned at 0 • , where the human eye is most sensitive [1]. Using a different angle for each channel reduces the effect of misregistration, on the other hand it introduces rosette patterns, which are quite visible at microscale [1]. The problem of halftone misregistration was studied in [1]. Figure 5a shows an example of the halftoning scheme with different screen angles for cyan, magenta, yellow and black channels. Black, the color with the strongest contrast, is halftoned and placed at 45° due to the lower sensitivity of eye at this angle, meanwhile yellow, the weakest color, is halftoned at 0°, where the human eye is most sensitive [1]. Using a different angle for each channel reduces the effect of misregistration, on the other hand it introduces rosette patterns, which are quite visible at microscale [1]. Namedanian proposed, in [1], using different wavelengths to separate colors, but the proposed approach for separating ink colors in color prints was limited to two colors. Moreover, this solution is limited to the available wavelengths, and sometimes it is hard to find good wavelengths to separate more than two color inks. The main issue with the separation of ink colors in color prints when the dots are printed on top of each other (doton-dot) is that the effect of one ink cannot be completely removed when sending the light at the absorbing wavelength band of the other ink, as a consequence a shadow effect appears. The term shadow effect of one ink means the contribution of that ink to the reflectance spectra of the other ink [1]. Namedanian proposed, in [1], using different wavelengths to separate colors, but the proposed approach for separating ink colors in color prints was limited to two colors. Moreover, this solution is limited to the available wavelengths, and sometimes it is hard to find good wavelengths to separate more than two color inks. The main issue with the separation of ink colors in color prints when the dots are printed on top of each other (dot-on-dot) is that the effect of one ink cannot be completely removed when sending the light at the absorbing wavelength band of the other ink, as a consequence a shadow effect appears. The term shadow effect of one ink means the contribution of that ink to the reflectance spectra of the other ink [1].
Considering that, in our study case, even if cyan dots were printed on the top of the magenta layer, there is no problem of dot detection in both the Cyan and Magenta channels. Due to the misregistration between these two channels, we process these two channels independently from the others. On the other hand, considering that the yellow dots were printed on the top of the cyan layer and that there is a problem of dot detection on the Yellow channel due to the misregistration between these two channels, we decide to mask the dots detected on the cyan image in the yellow image in order to be able, in the next step, to process the Yellow channel independent from the others. Images corresponding to samples number 7, 8, 9, 10, 14, 15, 16 and 18 in Tables 1-3 show a similar misregistration problem. In our experiments, we only deal with CMY images, and not CMYK images.       From Table 2, we can draw several observations: • Samples corresponding to samples number 11, 12 and 13 (and also to sample number 17, see Table 3) are very similar in terms of dots shape, most of the dots have a "elliptic equivalent" shape oriented in the direction of the print (see Figure 14), which is coherent with the values computed for the parameter "orientation".

•
The proportion of doughnuts is lower than for samples corresponding to samples number 1, 2, 3, 4, 5 and 6 (only 34% for samples number 11 and 12, and less than 8% for samples number 13 and 17). These doughnuts are less visible to the naked eye but are noticeable at microscopic scale. As there are less doughnuts in these samples the "circularity" parameter is more accurate (in the range between 0.81-0.86), which is coherent with an "elliptic equivalent" shape. On the other hand, the standard deviation of "circularity" values is higher (in the range 0.35-0.48), as the dot patterns are more heterogeneous. This is due to a higher standard deviation of the "perimeter" and of the "area" of the dots.          Unfortunately, the amount of data and of rotogravure printers used in this preliminary study (4467 dot samples vs. two printers) are insufficient to precisely evaluate the accuracy and robustness of all dot pattern parameters proposed; nevertheless, the amount of data was sufficient to demonstrate the relevance of these parameters.  Unfortunately, the amount of data and of rotogravure printers used in this preliminary study (4467 dot samples vs. two printers) are insufficient to precisely evaluate the accuracy and robustness of all dot pattern parameters proposed; nevertheless, the amount of data was sufficient to demonstrate the relevance of these parameters.
In the next section, we compare the robustness of these parameters versus different The knowledge that we have acquired about the printing process used enables us to propose an ad hoc solution to separate color dots from color channels. In future work, we will study if this approach could be extended to CMYK images. Figure 6b illustrates how the quality of the printed images studied in our study case is low, due to a high signal-to-noise ratio. Consequently, neither edge-based methods nor region-based methods are adapted to this type of images. On the other hand, the global thresholding method proposed by Otsu [4] is well adapted. This is for two main reasons: (1) it is based on an unsupervised histogram thresholding approach, minimizing intra-class intensity variance; (2) the performance of this method is among the best to differentiate ink coverage from paper regions [5]. However, this method suffers from a drawback, when the background is not homogeneous it performs lower than local threshold methods. To compensate the local variations related to the background and to improve the efficiency of the Otsu segmentation method, we propose to filter the segmented image a posteriori. the Yellow channel due to the misregistration between these two channels, we decide to mask the dots detected on the cyan image in the yellow image in order to be able, in the next step, to process the Yellow channel independent from the others. Images corresponding to samples number 7, 8, 9, 10, 14, 15, 16 and 18 in Tables 1-3 show a similar misregistration problem. In our experiments, we only deal with CMY images, and not CMYK images.

Binarization of Color Images (Step 2)
The knowledge that we have acquired about the printing process used enables us to propose an ad hoc solution to separate color dots from color channels. In future work, we will study if this approach could be extended to CMYK images. Figure 6b illustrates how the quality of the printed images studied in our study case is low, due to a high signal-to-noise ratio. Consequently, neither edge-based methods nor region-based methods are adapted to this type of images. On the other hand, the global thresholding method proposed by Otsu [4] is well adapted. This is for two main reasons: (1) it is based on an unsupervised histogram thresholding approach, minimizing intraclass intensity variance; (2) the performance of this method is among the best to differentiate ink coverage from paper regions [5]. However, this method suffers from a drawback, when the background is not homogeneous it performs lower than local threshold methods. To compensate the local variations related to the background and to improve the efficiency of the Otsu segmentation method, we propose to filter the segmented image a posteriori.

Filtering of Color Channels (Step 3)
Several morphological treatments were used in order to reduce the noise contained in the image. These treatments consist of: 1. A removal of dots connected to the image border.
2. An erosion with a disk-shaped structuring element of one-pixel radius. This erosion removes isolated pixels (punctual noise) and reduces the border effects due to the physical dot gain on the size of the dots (see Figure 6). 3. An opening (erosion followed by a dilation) with a disk-shaped structuring element of one-pixel radius. This opening removes noise and removes small objects from the foreground. 4. A removal of small holes in foreground areas with a disk-shaped structuring element of radius 1.

Filtering of Color Channels (Step 3)
Several morphological treatments were used in order to reduce the noise contained in the image. These treatments consist of:

1.
A removal of dots connected to the image border.

2.
An erosion with a disk-shaped structuring element of one-pixel radius. This erosion removes isolated pixels (punctual noise) and reduces the border effects due to the physical dot gain on the size of the dots (see Figure 6).

3.
An opening (erosion followed by a dilation) with a disk-shaped structuring element of one-pixel radius. This opening removes noise and removes small objects from the foreground.

4.
A removal of small holes in foreground areas with a disk-shaped structuring element of radius 1.

5.
A watershed transform based on the Euclidean distance. This transform segments contiguous areas into distinct objects, it enables the better separation of connected dots.
Figures 6c, 7e, 8d and 9e illustrate some results obtained with these operations.  Step 7: extraction of dots with concavity (∈ ID1/ID2/ID3) or an irregular shape (∈ ID4) defined by a solidity upper than 0.75, (i) extraction of dots with a convex shape (∈ ID5) with a circular equivalent shape, or an elliptic equivalent shape, or another pattern shape. Step 7: extraction of dots with a concavity (∈ ID1/ID2/ID3) or an irregular shape (∈ ID4) defined by a solidity upper than 0.75, (h) extraction of dots with a convex shape (∈ ID5) with a circular equivalent shape, or an elliptic equivalent shape, or another pattern shape. Step 7: extraction of dots with a concavity (∈ ID1/ID2/ID3) or an irregular shape (∈ ID4) defined by a solidity upper than 0.75, (h) extraction of dots with a convex shape (∈ ID5) with a circular equivalent shape, or an elliptic equivalent shape, or another pattern shape. Step 7: extraction of dots with a concavity (∈ ID1/ID2/ID3) or an irregular shape (∈ ID4) defined by a solidity upper than 0.75, (h) extraction of dots with a convex shape (∈ ID5) with a circular equivalent shape, or an elliptic equivalent shape, or another pattern shape. Step 7: extraction of dots with concavity (∈ ID1/ID2/ID3) or an irregular shape (∈ ID4) defined by a solidity upper than 0.75, (i): extraction of dots with a convex shape (∈ ID5) with a circular equivalent shape, or an elliptic equivalent shape, or another pattern shape.

Labelling of Regions of Interest (Step 4)
Connected pixels belonging to a region of interest of the foreground are labeled as dots. If the area of a region of interest is too large in comparison with the average area of the other dots, then this region is considered an outlier. If two dots overlap, it is very challenging to disconnect them, as this statistically concerns a few numbers of dots. Thus, we propose removing these regions from our analysis process. When two dots are adjacent, it is possible to disconnect them using the watershed transform (see step 3.5). Thus, pixels belonging to the frontier of the two dots (defined by the isosurface of their distance transform) are removed from the foreground in order to separately label each of these two dots.
On the other hand, if the area of a region of interest is too small in comparison with the average area of the other dots, then this region is considered an outlier.

Filtering of Outliers (Step 5)
If the area of a region of interest is larger than 1.5 × the average area of all dots or smaller than 0.5 × the average area of all dots, then it is removed from the foreground-see Figures 7g, 8f and 9f for examples. This step can be split in two sub-steps: first, the removal of two small dots and updating of the average area of remaining objects in the foreground, followed by the removal of two big dots.

Computation of Dot Descriptors (Step 6)
We defined six categories of dot patterns (see Figure 10): • ID1 narrow doughnut (with a hole inside the convex hull of the dot) with a symmetric shape; • ID2 concave shape (with a large hole) with a symmetrical shape; • ID3 concave shape (with a hole inside the convex hull of the dot) with an asymmetric shape; • ID4 elliptic/circular shape with a hole inside the dot; • ID5 elliptic/circular shape; • ID6 random shape; see Figures 7g, 8f and 9f for examples. This step can be split in two sub-steps: first, the removal of two small dots and updating of the average area of remaining objects in the foreground, followed by the removal of two big dots.

Computation of Dot Descriptors (Step 6)
We defined six categories of dot patterns (see Figure 10): • ID1 narrow doughnut (with a hole inside the convex hull of the dot) with a symmetric shape; • ID2 concave shape (with a large hole) with a symmetrical shape; • ID3 concave shape (with a hole inside the convex hull of the dot) with an asymmetric shape; • ID4 elliptic/circular shape with a hole inside the dot; • ID5 elliptic/circular shape; • ID6 random shape; The ID6 pattern is similar to the ID23 pattern defined in [5]. To analyze each individual dot, we propose computing the following shape descriptors: • Centroid (X, Y coordinates in μm); The ID6 pattern is similar to the ID23 pattern defined in [5].
To analyze each individual dot, we propose computing the following shape descriptors: Elongation along X axis = width of the elliptic bounding box/ width of the highest inscribed ellipse; • Symmetry relatively to X axis, to Y axis = A x1 /A x2 , = A y1 /A y2 , respectively.
Only a few of these descriptors were also used by Valla and Olson [5,6].
To categorize a dot in one of the six categories of dot patterns, we defined the following rules:

•
Concave shapes belonging to category ID1 are defined by an elongation upper than 1.75, a symmetry value A x1 /A x2 upper than 0.85 and lower than 1.15, and a solidity upper than 0.8.

•
Concave shapes belonging to category ID2 are defined by a solidity upper than 0.75 and a symmetry value A x1 /A x2 upper than 0.85 and lower than 1.15.

•
Concave shapes belonging to category ID3 are defined by a solidity upper than 0.75 and a ratio A ij /A upper than 0.15 (assuming that the concavity is in the region R ij ). • Dots belonging to category ID4 have a ratio A h /A upper than 0.15. Below this value, holes are insignificant. • Circular equivalent dots (see example shown in Figure 11) belonging to category ID5 are defined by a roundness value higher than 0.90 (and an eccentricity upper than 0.85).

•
Elliptic equivalent dots (see example shown in Figure 11) belonging to category ID5 are defined by a roundness value between 0.8 and 0.9 (and an eccentricity between 0.6 and 0.85). • Elongated dots (belonging to category ID6) are defined by an eccentricity lower than 0.6 and a roundness value lower than 0.8 (and an elongation upper than 1.3). • Any other shape which does not satisfy any of the above rules belongs to category ID6.
• Ae Area of the elliptic shape bounding the dot in μm 2 ; • Solidity = A/Ac; Elongation along X axis = width of the elliptic bounding box/ width of the highest inscribed ellipse; • Symmetry relatively to X axis, to Y axis = A x1 /A x2 , = A y1 /A y2 , respectively. Only a few of these descriptors were also used by Valla and Olson [5,6].
To categorize a dot in one of the six categories of dot patterns, we defined the following rules:

•
Concave shapes belonging to category ID1 are defined by an elongation upper than 1.75, a symmetry value A x1 /A x2 upper than 0.85 and lower than 1.15, and a solidity upper than 0.8.

•
Concave shapes belonging to category ID2 are defined by a solidity upper than 0.75 and a symmetry value A x1 /A x2 upper than 0.85 and lower than 1.15.

•
Concave shapes belonging to category ID3 are defined by a solidity upper than 0.75 and a ratio A ij /A upper than 0.15 (assuming that the concavity is in the region R ij ). • Dots belonging to category ID4 have a ratio Ah/A upper than 0.15. Below this value, holes are insignificant. • Circular equivalent dots (see example shown in Figure 11) belonging to category ID5 are defined by a roundness value higher than 0.90 (and an eccentricity upper than 0.85).

•
Elliptic equivalent dots (see example shown in Figure 11) belonging to category ID5 are defined by a roundness value between 0.8 and 0.9 (and an eccentricity between 0.6 and 0.85).

•
Elongated dots (belonging to category ID6) are defined by an eccentricity lower than 0.6 and a roundness value lower than 0.8 (and an elongation upper than 1.3).

•
Any other shape which does not satisfy any of the above rules belongs to category ID6. These rules are based on threshold values experimentally set from a statistical analysis of 4467 dots from 18 print samples. These thresholds are only valid for the test samples considered in this study case. These threshold values and rules must be adapted to each study case considered. For example, ID1, ID2, ID3 and ID4 patterns may occur only in few study cases, for example when the viscosity of the ink is low, as shown in images corresponding to samples number 1, 2, 3, 4 and 5 (see Table 1 and Figures 12 and 13). These thresholds are dependent of the quality of the print (noise, contrast, physical dot gain, rate of ink covering, overlap of inks, etc.) and, therefore, of the printing parameters, such as the ink viscosity or the ink temperature. The setting of these descriptors results in a trade-off between the accuracy of the descriptors and their robustness against print noise from one dot to another one. These rules are based on threshold values experimentally set from a statistical anal ysis of 4467 dots from 18 print samples. These thresholds are only valid for the test sam ples considered in this study case. These threshold values and rules must be adapted to each study case considered. For example, ID1, ID2, ID3 and ID4 patterns may occur only in few study cases, for example when the viscosity of the ink is low, as shown in images corresponding to samples number 1, 2, 3, 4 and 5 (see Table 1 and Figures 12 and 13) These thresholds are dependent of the quality of the print (noise, contrast, physical do gain, rate of ink covering, overlap of inks, etc.) and, therefore, of the printing parameters such as the ink viscosity or the ink temperature. The setting of these descriptors results in a trade-off between the accuracy of the descriptors and their robustness against print noise from one dot to another one.
From this study we claim that: several dot descriptors must be used to properly eval uate the shape of the dots printed; several dot descriptors must be analyzed conjointly to classify properly the dots in predefined dot patterns; the robustness of these thresholds and rules greatly depends on the print parameters.  . There are two missing dots. In total, 92% of dots have a circula shape-see statistical values report in Table 1. (b) % of dots belonging to category ID1 (less than 1%), ID2 (0%), ID3 (less than 1%), ID4 (40%), ID5 (49%, all have a circular shape), ID6 (6%, most o them are the top of the figure). There is no missing dot. In total, 94% of dots have a circular shape. . There are two missing dots. In total, 92% of dots have a circular shape-see statistical values report in Table 1. (b) % of dots belonging to category ID1 (less than 1%), ID2 (0%), ID3 (less than 1%), ID4 (40%), ID5 (49%, all have a circular shape), ID6 (6%, most of them are the top of the figure). There is no missing dot. In total, 94% of dots have a circular shape.

Global Analysis vs. Individual Analysis (Step 7)
Considering the heterogeneity of dot patterns and the random distribution of these dot patterns on print samples (as illustrated for example in Figure 13), we hypothesize that, to properly characterize and identify a print, it is necessary to analyze it at the global level-that is to say, to statistically analyze the distribution of the dots printed on these samples. The experimental results reported in Tables 1-3 show some statistical results for 18 print samples.

Experimental Results
The efficiency of the dot segmentation method was evaluated by visual evaluation, meanwhile the accuracy of some computed values (size of the major and minor axis, area, Figure 13. Another example of results for a cyan print with a cell angle of 60 • . % of dots belonging to category ID1 (34%), ID2 (0%), ID3 (0%), ID4 (0%), ID5 (55%), ID6 (11%, most of them are at the border of the figure). There is no missing dot. In total, 62% of dots have an elliptic shape, while 25% of dots have an elongated shape (less than 1% has a circular shape).
From this study we claim that: several dot descriptors must be used to properly evaluate the shape of the dots printed; several dot descriptors must be analyzed conjointly to classify properly the dots in predefined dot patterns; the robustness of these thresholds and rules greatly depends on the print parameters.

Global Analysis vs. Individual Analysis (Step 7)
Considering the heterogeneity of dot patterns and the random distribution of these dot patterns on print samples (as illustrated for example in Figure 13), we hypothesize that, to properly characterize and identify a print, it is necessary to analyze it at the global level-that is to say, to statistically analyze the distribution of the dots printed on these samples. The experimental results reported in Tables 1-3 show some statistical results for 18 print samples.

Experimental Results
The efficiency of the dot segmentation method was evaluated by visual evaluation, meanwhile the accuracy of some computed values (size of the major and minor axis, area, radius of the circumscribe circle) was evaluated using the software ScopeImage 9.0. The computed values reported in Tables 1-3 are coherent with measurement values. The problem of threshold methods evaluation and comparison is discussed in [5], meanwhile the problem of shape measurements calibration is discussed in [6]. In our study, the objective was only to make a relative comparison of computed values between print samples. The accuracy of the computations was secondary.
From Table 1 we can draw several observations: • Samples corresponding to samples number 1, 2, 3, 4, 5 and 6 are very similar in terms of dots shape. As most of the dots are circular, the parameter "orientation" is irrelevant (its standard deviation is very high). • Samples corresponding to samples number 1, 2, 3, 4, and 5 are very similar (high proportion of doughnuts), meanwhile sample corresponding to sample number 6 is unique (in comparison with these samples) as a significant proportion of its dots have a hole (low proportion of doughnuts).

•
The threshold separating "circular equivalent" shapes from "elliptic equivalent" shapes associated with the parameter "circularity" is less robust than the parameter "roundness" which is more relevant to characterize the circularity of non-circle shapes such as the one ∈ ID1/ID2/ID3). As illustration, compare the "circularity" of the sample corresponding to sample number 3 (0.80) with the "circularity" of the sample corresponding to sample number 11 (0.81, see Table 2), these values are similar but visually dots of the first sample are more circular than the second one.

•
The shape and size of dots of samples corresponding to samples number 1, 2, 3, 4, 5 and 6 is statistically quite homogeneous (standard deviation values are quite low, for example less than 10 µm for the perimeter).
From Table 2, we can draw several observations: • Samples corresponding to samples number 11, 12 and 13 (and also to sample number 17, see Table 3) are very similar in terms of dots shape, most of the dots have a "elliptic equivalent" shape oriented in the direction of the print (see Figure 14), which is coherent with the values computed for the parameter "orientation".

•
The proportion of doughnuts is lower than for samples corresponding to samples number 1, 2, 3, 4, 5 and 6 (only 34% for samples number 11 and 12, and less than 8% for samples number 13 and 17). These doughnuts are less visible to the naked eye but are noticeable at microscopic scale. As there are less doughnuts in these samples the "circularity" parameter is more accurate (in the range between 0.81-0.86), which is coherent with an "elliptic equivalent" shape. On the other hand, the standard deviation of "circularity" values is higher (in the range 0.35-0.48), as the dot patterns are more heterogeneous. This is due to a higher standard deviation of the "perimeter" and of the "area" of the dots.  • Samples corresponding to samples number 11, 12 and 13 (and also to sample number 17, see Table 3) are very similar in terms of dots shape, most of the dots have a "elliptic equivalent" shape oriented in the direction of the print (see Figure 14), which is coherent with the values computed for the parameter "orientation".

•
The proportion of doughnuts is lower than for samples corresponding to samples number 1, 2, 3, 4, 5 and 6 (only 34% for samples number 11 and 12, and less than 8% for samples number 13 and 17). These doughnuts are less visible to the naked eye but are noticeable at microscopic scale. As there are less doughnuts in these samples the "circularity" parameter is more accurate (in the range between 0.81-0.86), which is coherent with an "elliptic equivalent" shape. On the other hand, the standard deviation of "circularity" values is higher (in the range 0.35-0.48), as the dot patterns are more heterogeneous. This is due to a higher standard deviation of the "perimeter" and of the "area" of the dots.   Unfortunately, the amount of data and of rotogravure printers used in this preliminary study (4467 dot samples vs. two printers) are insufficient to precisely evaluate the accuracy and robustness of all dot pattern parameters proposed; nevertheless, the amount of data was sufficient to demonstrate the relevance of these parameters.
In the next section, we compare the robustness of these parameters versus different printing systems.

Prints vs. Reprints
Values reported in Table 2 samples number 11, 12 and 13 (see also Figure 15a-c) show that the dots distribution in these samples are statistically very similar (elliptic shape), nevertheless we noticed in previous section that the proportion of doughnuts in sample number 13 (printed with printer P2) is lower than for the two other samples (both were printed with printer P1).
Likewise, values reported in Table 1 for samples number 1, 2 and 3 (see also Figure 15d-f) show that the dot distribution in these samples is statistically very similar (circular shape), nevertheless we notice that the proportion of doughnuts in sample number 3 (printed with printer P2) is once again lower than for the two other samples (both were printed with printer P1). This is coherent with our hypothesis that the deformations at microscopic scale of printed dots, induced by the printing process, can be used as a fingerprint to differentiate two prints.
Let us now compare three reprint samples of the same artwork. Images shown in Figure 16 show that the dot distribution in these samples is very similar (Cyan dots have an elliptic shape, Magenta dots have a circular shape and Yellow dots have a random shape), results shown in Figure 16d-f demonstrate the dot patterns detection process is quite robust against noise, non-uniformity of lighting field, dots connectivity, dot-on-dot misregistration, inks, etc. Here, the shape of Yellow dots was badly estimated due to the overlap of Cyan and Magenta dots on the top of Yellow dots. The dot distribution of these reprint samples does not correspond do the dot distribution of the print sample scanned (sample [11]), due to the high reflectance of the substrate, which complexifies the color calibration process and to an inaccurate color measurement of the dot distribution.
nevertheless we noticed in previous section that the proportion of doughnuts in sample number 13 (printed with printer P2) is lower than for the two other samples (both were printed with printer P1).
Likewise, values reported in Table 1 for samples number 1, 2 and 3 (see also Figure  15d-f) show that the dot distribution in these samples is statistically very similar (circular shape), nevertheless we notice that the proportion of doughnuts in sample number 3 (printed with printer P2) is once again lower than for the two other samples (both were printed with printer P1). This is coherent with our hypothesis that the deformations at microscopic scale of printed dots, induced by the printing process, can be used as a fingerprint to differentiate two prints. Let us now compare three reprint samples of the same artwork. Images shown in Figure 16 show that the dot distribution in these samples is very similar (Cyan dots have an elliptic shape, Magenta dots have a circular shape and Yellow dots have a random shape), results shown in Figure 16d-f demonstrate the dot patterns detection process is If we compare samples corresponding to [11][12][13] (printed samples) with samples corresponding to [14,15] (reprinted samples), we can notice a higher proportion of dots ∈ ID6 for [14,15] than for the three other samples (see Table 3). This can be explained by the fact that, for the reprint samples, it is more difficult to estimate properly the shape of the dots due to a higher standard deviation of "area" and "circularity" parameters (and consequently a higher connectivity rate of dots). We hypothesis that this is due to the printing parameters used for the reprint rather and not to the scan process (as the density of Cyan dots was well estimated). We can, therefore, draw the assumption that the deformations at microscopic scale of reprinted dots (induced by the print and scan and reprint process) can be used as a fingerprint to differentiate a print from a reprint. Figure 17 illustrates two examples of dots' deformation, one for Magenta print dots (see Figure 17c), another for Magenta reprint dots (see Figure 17f). The overgrowth of dots due to the printing process are smoothed, thanks to morphological operations applied before computing shape indexes, but they are not removed as they are intrinsic of the printing process. The "elongation" parameter enables these local deformations to be quantified (see examples highlighted with an orange arrow). The size of the Magenta dots is larger in Figure 17f

Discussion
In this study, we did not compare the performance (nor the accuracy) of our thres olding method to other thresholding methods. Likewise, we did not compare the perfo mance (nor the accuracy) of our size and shape indexes with other indexes of the state the art. Our main objective was to demonstrate that the use of size and shape indexes not sufficient to properly characterize the shape of print dots (printed using a rotogravu process), and that the categorization of dots in predefined dot patterns combined with

Discussion
In this study, we did not compare the performance (nor the accuracy) of our thresholding method to other thresholding methods. Likewise, we did not compare the performance (nor the accuracy) of our size and shape indexes with other indexes of the state of the art. Our main objective was to demonstrate that the use of size and shape indexes is not sufficient to properly characterize the shape of print dots (printed using a rotogravure process), and that the categorization of dots in predefined dot patterns combined with a

Discussion
In this study, we did not compare the performance (nor the accuracy) of our thresholding method to other thresholding methods. Likewise, we did not compare the performance (nor the accuracy) of our size and shape indexes with other indexes of the state of the art. Our main objective was to demonstrate that the use of size and shape indexes is not sufficient to properly characterize the shape of print dots (printed using a rotogravure process), and that the categorization of dots in predefined dot patterns combined with a statistical analysis of relevant shape indexes is more relevant to properly characterize a print from its dots' distribution. In Section 4, we discussed the limitations of the different methods used for steps 1 to 7 of the framework that we propose and of the threshold values that we used. Our objective was not to optimize the efficiency of each of these steps but to propose a relevant framework to our study case that could be extended to other study cases similar to the one studied. Several limitations pointed out in Section 4 would need improvements that could be obtained only with a higher number of print samples.
This preliminary study was done from a "limited" set of prints (with, in total, 4467 dots). To evaluate the accuracy and robustness of the pattern recognition process proposed, more tests should be done with annotated data. More samples should be analyzed and also more printers (and printing parameters) should be compared. This is costly and time consuming. Likewise, to evaluate the efficiency of our hypothesis according with print and reprint samples can be differentiate from dots' deformations at microscopic scale more prints and reprints (and printing parameters) should be compared. That is, the first results shown in this paper are very promising and demonstrate that the study parameters and the proposed methodology are relevant enough to support the hypothesis. In a future work we will compute the confusion matrix (in %) of the proposed classification method and will evaluate its robustness using ROC Curves and Precision-Recall curves. In the case of overlapping problems between classes, we will investigate machine-learning-based methods to optimize soft decision.
The experimental analysis performed in Section 4 shows the importance of considering, in the analytical process, the theoretical/practical knowledge/expertise that we have about the rotogravure printing process and about the shape of the dots printed. This justifies the use of a pattern-recognition-based process rather than a machine-learning-based process. Moreover, machine-learning-based methods, such as deep-learning methods, need lot of data.

Conclusions
In this paper, we investigated a promising solution based on a pattern recognition process to identify a print document (original) from a reprint document (copy or fake). We demonstrated that the geometrical shape of printed dots can be analyzed at microscopic scale to differentiate a print from a reprint, as a fingerprint.
To improve the efficiency of the dot segmentation process, we propose a new image processing pipeline. To better characterize complex geometrical dot shapes, we defined six types of dot patterns, including four types of doughnut patterns, and defined a new rules-based characterization method which combines several dot pattern features. To better characterize each printed sample from the set of dots printed on it, we proposed a new statistical analysis method based on a pattern recognition process. The preliminary experiments demonstrated the relevance of the criteria/rules/methods proposed and of the statistical analyze method implemented. In a future work, we will perform complementary experiments to study the robustness of our solution in the function of the printing parameters, as well as the printing device used. We will also study the impact of the scan process involved in the reprint process (based on a scan-and-print process). The implemented solution suffers from some limitations that were discussed in the paper. In future work we will study how to improve some steps of the image processing process, especially the segmentation process when neighboring dots are overlapping or when color dots (primary colors) are overlapping.
If one want to generalize the rules suggested in this paper for new data without overfitting, it could be interesting to investigate an unsupervised machine learning method [14]. Noise learning is one important cause of overfitting. Pruning could be used to eliminating less meaningful or irrelevant data and, finally, to prevent overfitting and to improve the classification accuracy [15]. During the learning process, stopping criteria could be used to determine when to stop adding conditions to a rule, or adding rule to a model description.
Pruning could be also used to test the significance of rules based on significant differences between the distribution of positive and negative examples. It will be interesting to investigate if other combinations of rules could improve the classification accuracy of dots regardless of their shape. Parameter tuning is another important cause of overfitting; it needs sufficient samples for learning. To expand the training set, few (data augmentation) strategies could be used: acquire more training data; add some random noise to existing dataset; reacquire some data from existing data set through some image processing; produce some new data based on the distribution of an existing data set or from simulated data [15]. Well-tuned parameters make a good balance between training accuracy and regularity [15]. As the number of dot patterns resulting from the rotogravure printing process is very limited, the number of useful features will not increase with new data. However, the weight of these features could have an influence on the final classification. A regularization process could be used to minimize the weights of the features which have little influence on the final classification.