Quality Assessment of 2.5D Prints Using 2D Image Quality Metrics

: Quality assessment is an important aspect in a variety of application areas. In this work, the objective quality assessment of 2.5D prints was performed. The work is done on camera captures under both diffuse (single-shot) and directional (multiple-shot) illumination. Current state-of-the-art 2D full-reference image quality metrics were used to predict the quality of 2.5D prints. The results showed that the selected metrics can detect differences between the prints as well as between a print and its 2D reference image. Moreover, the metrics better detected differences in the multiple-shot set-up captures than in the single-shot set-up ones. Although the results are based on a limited number of images, they show existing metrics’ ability to work with 2.5D prints under limited conditions.


Introduction
Objective image Quality Assessment (QA) has a significant demand because it is automatic, consistent, and less resource demanding compared to subjective image QA. There are studies that assess the quality of 2D print images objectively [1,2]. However, 2.5D reproduction QA in an objective way is less studied. Therefore, the goal of the current work is to investigate if existing 2D Image Quality Metrics (IQMs) are suitable to assess the perceptual quality of 2.5D reproductions. At this moment, to our knowledge, no IQM is standardized for 2.5D reproduction QA. As a result, existing IQMs should be tested even though it is expected that it will be a difficult scenario for them to assess the quality of 2.5D prints. Moreover, different options to digitize prints should be tested to determine which data representation is more appropriate for IQMs. Consequently, we also address the question of how to digitize 2.5D prints for QA. The acquisition set-up and light source might influence the captured data, but information regarding the responsiveness of the IQMs on the quality variations that 2.5D prints hold can still be valuable, especially in (serial) production of prints. Thus, first we check the responsiveness of the selected 2D IQMs on the quality variations that 2.5D prints hold. Next, we analyze the quality maps of IQMs because they can reveal more information about print quality in comparison with just IQMs' values. Last, we compare which of the two set-ups we tested is more suitable for 2.5D prints capture based on IQMs' performance.
This paper is organized as follows. First, we give background information about different dimensional prints' (e.g., 2D, 2.5D prints) QA as well as capture approaches. Afterwards, we describe our methodology followed by the results and discussion. Finally, we provide our conclusions and future works.

Background
There have been works in improving quality of 2.5D printing including characterization of relief printers [3], printing gloss effects [4], development of a 2.5D printing machine with a software [5], and proposal of a novel approach for a 2.5D printing based on semantic information [6]. Presently, 2.5D printing is used widely in many applications such as decoration (e.g., interior design), signage, and maps (e.g., for visually impaired people), to name a few. Therefore, 2.5D prints quality should be assessed carefully before releasing them into the market for sale. Many studies have explored the quality of images [7][8][9][10][11] and prints [12][13][14]. The quality of 2.5D prints in this work can be related to the definitions given by Keelan [15] and ISO [16], where image quality is an observer's perception of image excellence where the observer was not involved with anything related to the given image. In addition, 2.5D prints tend to have surface elevation. This can impact on the overall quality perception by customers. The importance of 2.5D prints is that the variations in surface elevation of 2.5D prints can create many possible options for shapes and texture patterns such as diversification of what printer can print as well as enhancement of appearance of prints [17]. This is useful in many applications such as signage, decorations, maps or reproduction of art works (e.g., to reproduce brush strokes). First, brief QA aspects and then capture aspects will be described for 2D and 2.5D prints.

Quality Assessment
In general, QA of any print can be either subjective or objective, or a combination of both. For example, Pedersen et al. [18] conducted a subjective experiment to identify meaningful image quality attributes for 2D color prints' QA. Their observers' task was to rate the quality of 2D color prints and state every quality attribute they used even if some quality attributes had little impact on the QA. The rating task considered seven scale levels, where a value of 1 meant that observers found the 2D color prints as the most pleasing, whereas a value of 7 meant the least pleasing. Afterwards, they validated the chosen quality attributes (color, sharpness, lightness, contrast, and artifacts) through another subjective experiment [19].
Regarding 2.5D print subjective QA, an experiment with observers was conducted to study the most used distinct attributes [20]. The observers judged the quality of 2.5D prints that were fabricated with a 2.5D printer. They were asked to rank the quality of 2.5D prints and describe the reasons for their ranking. The experiment comprised of two parts: first, when observers were not provided with the reference images and second, when the reference images were provided. The relevant attributes were proposed to be the top five most used distinct attributes in their experiment. These were color, sharpness, elevation, lightness, and naturalness. Samadzadegan et al. [21] performed a subjective experiment with 2.5D prints and their goal was to find the effect of color on gloss. According to their results, color has no significant effect on gloss. Nevertheless, they suggested that color and gloss need to be taken into account for print QA.
The objective QA aspect can involve a variety of metrics. There are many diverse sets of 2D IQMs depending on the application area, attributes, performance and accuracy level, and availability of reference images. According to the literature [1,[22][23][24][25][26], Human Visual System (HVS)-based metrics are better than those metrics without using a model of the HVS. Additionally, visual quality can be better predicted by the HVS-based metrics rather than simple pixel-based difference metrics [27]. There are Full-Reference (FR), reduced-reference, and No-Reference (NR) IQMs. Most of the IQMs are based on detecting distortions and predicting quality based on that. Due to the unavailability of reference images in practice, sometimes NR metrics are preferable. However, FR metrics can be more straightforward to detect distortions because of reference image availability [28]. Thus, there are pros and cons of each type of metric. Moreover, it is important to mention that these days it is becoming a trend to use deep learning or machine learning methods to create IQMs [29][30][31][32]. For instance, Akyazi et al. [28] created a FR metric that takes as input-filtered reference and distorted images and is based on deep neural networks.
There are several frameworks for QA of 2D color prints/images using IQMs [1,[33][34][35]. For example, Pedersen and Amirshahi [1] used Spatial CIELAB (SCIELAB) due to its often use as a reference metric, spatial hue angle metric because it combines two state-of-theart metrics, adaptive bilateral filter due to the ability of bilateral filtering to simulate the HVS, Structural Similarity (SSIM) due to its common use and working scheme on local neighborhood, and other IQMs for QA of 2D color prints. They concluded that their results are both image and metric dependent.
Baar et al. [36] proposed potential approaches with advantages and disadvantages towards an IQM for 2.5D prints based on their reviewed literature for 3D prints' QA. Liu et al. [3] performed objective QA of 2.5D printing process in height dimension in terms of fidelity and surface finish using modulation transfer function, mean absolute difference, and other metrics. The surface finish difference between real and ideal prints that might be perceived by observers was simulated through their created light-reflection model. They concluded that the surface roughness depends on prints' geometry, increase of frequency of fine details reduces reproduction accuracy, and both viewing angle and illumination direction impact the visual experience.

Capture Techniques
The IQMs require a digitized version of the reference and/or the reproduction to be used. In the case of printed images, the physical print needs to be digitized prior to QA. Scanners have been used for 2D print digitization [1,2,33,34,37]. Moreover, cameras have been used for QA of flat surfaces [38]. However, scanners are not suitable in this work because we work with non-flat prints (i.e., 2.5D prints with a surface elevation). The results of the work by Zhao et al. [39] support that image QA of flat surfaces (i.e., projection displays) captured through a camera is a relevant approach that works. Different techniques can be used to capture 2.5D prints. It depends on which attributes one is interested to capture. For example, art paintings are known to have relief surface structure and Zaman et al. [40] were able to capture the topography and color of oil paintings by their proposed hybrid set-up. It consisted of two cameras and a projector. They connected fringe projection with stereo imaging which performed well in color and depth information capturing. The paintings' depth and color information perception with respect to our eyes was mimicked using stereo imaging. The image registration process was avoided by capturing topography and color at the same time. Elkhuizen et al. [41] captured the spatially varying gloss of their hand-painted samples. They used high dynamic range images. Their set-up consisted of a camera and a series of light-emitting diode lights. They concluded that the essential attributes for art paintings are translucency and gloss. High accuracy and precision for painting measurement were acquired using multi-scale optical coherence tomography and 3D digital microscopy [42]. However, these measurement tools were slow due to the small field of view. They also tested 3D scanning based on fringe encoded stereo imaging.
Reflectance Transformation Imaging (RTI) is a common tool to capture appearance under different directional light [43]. The object and camera are fixed perpendicular to each other while the light source moves in RTI. It is common to use RTI for texture visualization purposes of paintings [44] and capturing of low-relief surfaces [45]. For example, Pintus et al. [46] performed RTI of cultural heritage data visualization assessment both subjectively and objectively. Recently, Kitanovski et al. [47] assessed the quality of relighting from images acquired through their proposed multispectral RTI system. They captured 3D objects with various colors and translucencies.
These capture techniques can be applied to different 2.5D prints (maps, signage, etc.). However, some of them might be costly, time consuming, and, most importantly, they might involve post-processing to reconstruct color, depth, gloss, and other attributes that might require (costly) software tools. Overall, capturing the whole appearance features of 2.5D prints or similar objects is challenging. Nonetheless, there are some ongoing works in this direction [48].

Methodology
We used the physical 2.5D prints from Kadyrova et al. [20]. They fabricated 42 2.5D prints consisting of 12 reference images with three instances (i.e., in terms of quality aspect variations) and 3 images with two instances by Canon Arizona series 2.5D printer.
The quality variations were divided into five sets in their work. In our work, we focus on three sets because the quality issues were clearly visible/distinguishable for the observers in these sets based on data from Kadyrova et al. [20]. They are as follows ( Figure 1): • Naturalness set with natural elevation, unnatural elevation, and surface roughness prints (tiles, wood, brick images); • Height set with maximum heights of 1 mm, 0.5 mm, and 0.25 mm prints (scissor, speed sign, running track images); • Printer mode set with Alto (i.e., when elevation is opaque) and Brila (i.e., when elevation is varnish) modes prints (flower, packaging, snowflake images).
Our workflow is illustrated in Figure 2. We start with acquiring digital data of 2.5D prints followed by preprocessing. Afterwards, relevant IQMs are selected and applied, and data analysis is performed. The results are expected to be from responsiveness test of the IQMs, insights from the IQMs, and comparison of the two capture set-ups.  [20]. From top left: tiles, wood, brick, scissor, speed sign, running track, flower, packaging, and snowflake images. Our workflow. The prints are captured in the single-shot and multiple-shot set-ups, then they are processed, further IQMs are applied, before we do our analysis and report the results.

Data Capture
There are several options to digitize physical 2.5D prints. In this work, we used camera-based set-ups to acquire digital data of physical 2.5D prints because it is relatively fast and affordable. We included diffuse and multiple angle directional illumination setups using a Nikon D610 professional camera alone (single-shot set-up) and RTI with the same camera (multiple-shot set-up), respectively. There was a single viewing angle in both set-ups.

Single-Shot Set-Up
We used a Nikon D610 professional camera with a Sigma 24-105 mm lens placed on a fixed tripod. The distance from the camera lens to the prints was approximately 51 cm. The 2.5D prints and a Macbeth ColorChecker were placed inside the light booth cabinet (VeriVide CAC 60-5, illumination was around 1328 lux) with D65 (diffuse) illumination. We used the following setting parameters for the camera: ISO was 125, the aperture was 4.5, and the shutter speed was 1/80 with manual exposure mode. We worked with the camera jpeg images and the color checker was used for white balancing. The single-shot set-up is further referred to as SS.

Multiple-Shot Set-Up
Because 2.5D prints have elevation and angle dependence appearance due to, for example shadows, RTI was used. The 2.5D prints were digitized by an RTI set-up based on a robotic arm [47]. The camera and prints were fixed (the distance between them was approximately 51 cm) while the light source, mounted on the robotic arm, was moving. We acquired 60 captures based on 60 illumination angles per print. Similar to SS, the camera jpeg images were used and the color checker was used for white balancing (it was captured separately in this set-up). To our knowledge, this is the first attempt to assess the objective quality of 2.5D prints using RTI captures. The multiple-shot set-up is further referred to as MS.

Preprocessing
We performed vignetting correction (for the SS captures) and image registration (for both set-up captures).
We used the method from Zhao et al. [38] for vignetting correction mask generation for data captured in the SS. We applied the correction mask on the Y channel of YCbCr for all captures. In the MS, the prints were placed relatively in the center and vignetting was less of an issue.
Accurate registration of the captured images to the reference is required for using FR IQMs. Different methods have been used for the registration of 2D print captures [35,38,49] but registration of 2.5D print captures with respect to the 2D reference images was found to be challenging. Nevertheless, manual image registration with homography transformation gave desirable results. The points (coordinates) for the homography were selected manually for each print capture from both set-ups. The algorithm uses bilinear interpolation. The registration was manually checked by the authors to ensure a correct registration.

Full-Reference Image Quality Metrics
We chose to work with FR IQMs because we have reference images and FR IQMs' universality is higher than that of NR IQMs [50].
The SSIM metric is selected because it is a HVS-based FR metric that considers structural information from the image scene [51]. This metric can be used to assess lightness, contrast, sharpness, and artifact attributes [2]. Additionally, Pedersen and Amirshahi [1] mentioned its potential to detect artifacts. It is used to calculate the perceived difference between the reference and distorted input images and it is independent of both average luminance and contrast [51]. It is appropriate for use with 2.5D prints because it is based on structure and most of the aspects that have been changed in the dataset influence structure.
The Multi-Scale SSIM (MS-SSIM) is selected because it incorporates variations of image resolution and viewing conditions [52]. We used MS-SSIM at scale 2 (scale 1 is the same as SSIM) for both structure and contrast terms because at higher scales it assesses a portion of an image due to downsampling.
The improved Color Image Difference (iCID) is selected because it considers the reference image's color, structure, and contrast [53]. The metric was used with 51 cm viewing distance and downsampling was deactivated. We used its lightness difference, lightness contrast, and lightness structure maps.
We included the color difference metric-CIEDE2000 [54] to see how a pixel-based metric will behave on 2.5D prints' QA. It works with the CIELAB color space.
The SCIELAB is selected because of its spatial filtering to mimic the HVS's spatial blurring [55]. We modified this metric so that it works with the CIEDE2000 formula. The viewing distance in SCIELAB was set to 51 cm. The white point was D65 in both CIEDE2000 and SCIELAB.
For all metrics, the input images were color images of 904 × 550 pixels. The color images were converted into grayscale following ITU recommendation BT.601-7 for SSIM and MS-SSIM. We used the default parameters for the selected IQMs.

Results and Discussion
We present readers with examples of what 2.5D print captures and their height maps look like in Figure 3. White corresponds to flat areas while black corresponds to maximum elevation in the height maps. The height and printer mode sets use one height map where the maximum height and the printer mode were varied during printing, respectively.
Due to the small size of the dataset and that there are only three (two for three images in the printer mode set) quality variations per image, it is less appropriate to look at the correlations between the IQMs and the subjective scores. Nevertheless, the IQMs can still be informative on quality aspects. Investigation of a single value of the IQM can be limiting, therefore an in-depth analysis of their quality maps can be valuable [56]. A quality map from a metric can be a good tool to find the location of errors for 2.5D printing applications. More specifically, the mean values of IQMs for different prints can be similar, but differences can be found in the quality maps.
We found that it is challenging to define a single or group of illumination angles from the MS captures that could be informative on 2.5D prints' quality aspects from the selected IQMs. In this light, we worked with mean quality maps of 60 different illumination angle captures. In other words, the IQMs were applied to each of the 60 illumination angle captures and the mean of 60 quality maps were taken per captured print. This can help to incorporate individual pixel values from all 60 angles. Furthermore, this is useful as the light source positions were symmetrically distributed and all effects, for example from shadows, also will symmetrically impact the mean quality map. Although the effects of the directional light may cancel out because of the symmetric distribution of the light positions, what will remain after can be interesting to analyze.

Are 2D IQMs Responsive to the Quality Variations in 2.5D Prints?
It is relevant first to check if the selected 2D IQMs are responsive to the quality variations introduced in the dataset of 2.5D prints. We will do this using the FR IQMs and calculating the difference between two prints, instead of between a print and the reference. If the IQMs are not responsive, they will yield no difference between the prints. For this test, we calculate the difference between the unnatural elevation -1 mm maximum height -Brila mode (used as the test image) and surface roughness -0.25 mm maximum height -Alto mode (used as the reference image) prints in each three images in naturalness -height -printer mode sets, respectively.

Naturalness Set
The differences are expected to be in elevation and fine details from surface roughness. Shadows might be introduced due to elevation and thus more differences on the edges are expected, especially in the MS captures because it captures the prints under multiple angle directional illumination.
All selected IQMs are responsive to this set's images on the edges, on the background, and on the elevated parts' surfaces in both set-up captures. More specifically, the highest response is found on the edges. This shows that they are responsive to the differences between unnatural elevation and surface roughness prints (i.e., the significant differences between the two were in elevation and shadows induced from it and the secondary difference was in fine details). The differences in the prints are better captured by the IQMs in the MS captures than in the SS ones for the three images. Figure 4 illustrates the above-mentioned observations by iCID's lightness difference map on the tiles image where the difference comes from shadows that can appear on both sides of the grout line, which is better captured in the MS captures than in the SS ones. In addition, iCID's lightness structure map is responsive to both edges and fine details in the three images in this set in both set-up captures.

Height Set
The differences are expected to be in elevation and shadows because only height was changed between the prints in this set. The differences in the prints are better captured by the IQMs in the MS captures than in the SS ones for the three images, similar to the previous set. All selected IQMs are responsive to the differences (i.e., the highest response was on the edges compared to the background and the surface of the elevated parts) in both set-up captures. SSIM and MS-SSIM (both structure and contrast terms, scale 2) responded to differences in the background in the scissor image in the MS captures, although this area had not been elevated or changed between the prints. These metrics were also responsive to the edges and elevated parts' surfaces. However, this does not indicate that they are performing incorrectly in the scissor image. We assume that they are detecting halftone noise which is perceptually difficult to see at a normal viewing distance. This can explain the performance of SSIM and MS-SSIM (both terms, scale 2) on the scissor image. Figure 5 shows histograms of the mean quality maps from the MS captures for the scissor image by SSIM and iCID's lightness difference map as a comparison. The histogram of SSIM is more spread than of iCID's lightness difference map because the latter contains Contrast Sensitivity Function (CSF) which filters the halftone noise. Thus, the halftone noise becomes less visible to some degree. SSIM and MS-SSIM do not contain the CSF and thus they are calculating differences that might be not perceptible at a certain distance. As a result, we observe that iCID's lightness difference map is fairly robust to halftone noise and a model of the HVS is useful to avoid the metric calculating differences that are outside the CSF threshold (i.e., invisible to the human eye). We assume that SSIM and MS-SSIM (both terms, scale 2) in combination with the MS can be used in a quality assurance application to detect differences present between two prints/images not necessarily related to the perceptual aspects. Another reason for the behavior of SSIM and MS-SSIM (both terms, scale 2) can be the directional dependent appearance (i.e., complex) of the 2.5D prints. For instance, the histograms of (mean) quality maps of the scissor image by SSIM in both MS and SS captures show that the prints are noisy, but the prints are noisier in the MS captures ( Figure 6).
Similar to SSIM and MS-SSIM (both terms, scale 2), iCID's lightness structure map also detected halftone noise in the scissor image from the MS captures between the prints (which was also responsive on the edges and elevated parts' surfaces). Its performance can be somewhat expected although it uses Gaussian weight distribution in its formula and the CSF was applied to the input images before calculating different maps in iCID. However, we used a lower pixels-per-degree value of a visual field, therefore simulating the HVS at a closer viewing distance when halftone noise becomes perceptible. Assigning a higher pixels-per-degree value of a visual field for the CSF filtering might improve iCID's performance [53]. Based on this, we found out that iCID is fairly robust to halftone noise at higher pixels-per-degree value (Figure 7).  iCID's lightness structure mean quality maps for the scissor image with 1 mm maximum height print used as test image and 0.25 mm maximum height print used as reference image from the MS captures with lower (left) and higher (right) pixels-per-degree values. We see that halftone noise was reduced with higher pixels-per-degree value, where the quality values are closer to zero (meaning higher quality).

Printer Mode Set
The differences are expected to be in elevation and in shadows as well as in color. There is a height difference between Alto and Brila modes, where the former has a maximum height of 0.5 mm and the latter 0.25 mm. In addition, the Brila mode introduces a color shift. All selected IQMs have the highest response on the edges in both set-up captures. It is expected that the appearance is different between Alto and Brila modes because the elevation in the Alto mode is opaque while it is relatively transparent in the Brila mode. This difference is captured by the IQMs by being responsive on the edges. The IQMs also showed responsiveness to the differences on the surface of the elevated parts and on the background in both set-up captures. For instance, CIEDE2000 and SCIELAB have the highest response on the edges than on the surface of the elevated parts and on the background in flower and snowflake images. In the packaging image, their response is similar on the background and on the surface of the elevated part. This is because the Brila mode added a yellowish color on the elevated parts of the prints and the content of the packaging image has a somewhat similar color in the background and elevated part. Hence, CIEDE2000 and SCIELAB found similar color differences between Alto and Brila mode prints on both the background and the surface of the elevated part. SSIM, MS-SSIM (both terms, scale 2), and iCID's lightness structure map detected halftone noise in the three images in the MS captures and in the flower image in the SS captures along with being responsive on the edges, on the background, and on the elevated parts' surfaces. The detection of halftone noise by iCID's lightness structure map, SSIM, and MS-SSIM (both terms, scale 2) can be explained with the same reasoning given in the previous set for these metrics. The IQMs better captured differences in the MS captures than in the SS ones in this set.
To conclude, CIEDE2000, SCIELAB, iCID's lightness difference, lightness contrast, and light structure maps are found to be responsive to the differences between the prints in both set-up captures. The differences detected by these IQMs are more visible in the MS captures than in the SS ones. Based on this, the above-mentioned IQMs are considered in the next subsection. We keep iCID because it shows a capability to be used for perceptual QA of 2.5D prints as it is fairly robust to halftone noise arising from the printing process. We use iCID with an initial lower pixels-per-degree value for further analysis because the location of detected differences is the same regardless of the value of pixels-per-degree ( Figure 7). There can be a slight change in the scale which is adequate because at a closer viewing distance we expect to see more differences than at longer distances.

Can We Obtain Insights on 2.5D Prints' Quality from 2D IQMs?
We present cases where the selected 2D IQMs and their quality maps are informative regarding relevant attributes between the three sets' images and their 2D reference images.

Naturalness Set
In both set-up captures, CIEDE2000, SCIELAB, and iCID's lightness difference map detected more differences on the elevated parts' surfaces in the prints with surface roughness while on the background in the prints with unnatural elevation in the three images. The same pattern was observed by iCID's lightness contrast and light structure maps in brick and wood images.
From the line profile taken from the (mean) quality map (the quality map was rotated to −65 • to make the grout lines to be relatively vertical), CIEDE2000, SCIELAB, and iCID's lightness difference map show a similar pattern that there is less difference on the grout line than on the tiles and edges in the three prints from both set-up captures in the tiles image. To demonstrate this, we give an example of a line profile from the MS captures from SCIELAB in Figure 8. Based on Figure 8, we found the following observations: the shadow makes larger color differences on the left side of the grout line and this is because of the elevation made on the grout line in the print with unnatural elevation. There is also a larger color difference in the print with unnatural elevation on the right side of the grout line (around 115 to 130 pixels). This can be explained due to the elevation of the grout line in the print with unnatural elevation where light might hit and reflect back making the right side of the print slightly brighter (i.e., inter-reflection). In addition, there is some gloss on the elevated grout line and this might cause inter-reflections. As a result, we assume that CIEDE2000, SCIELAB, and iCID's lightness difference map are not able to detect whether the grout line is elevated or not, but they are able to detect shadows from the grout line.
iCID's lightness structure map found more differences in the surface roughness print than in the other two prints in the three images in the MS captures ( Figure 9) while it found more differences in the unnatural elevation print than in the other two prints in the wood and tiles images in the SS captures. Moreover, SCIELAB and CIEDE2000 seem to be able to detect fine details from the surface roughness print of the wood image in both set-up captures ( Figure 10).
In addition, more differences were detected by CIEDE2000 (tiles image), SCIELAB (tiles, wood images), iCID's lightness difference (tiles, wood images), and lightness structure (tiles image) maps in the three prints in the MS captures than in the SS ones.

Height Set
In both set-up captures, CIEDE2000, SCIELAB, and iCID's lightness structure map found more differences on the edges of the three prints in the three images. iCID's lightness contrast map also found more differences on the edges of the three prints in the scissor and speed sign images. More differences were found in the 1 mm maximum height print than in the other two prints in speed sign and running track images by CIEDE2000, SCIELAB, iCID's lightness difference, and lightness structure ( Figure 11) maps in both set-up captures. This can be explained by the level of elevation.  Additionally, more differences found by CIEDE2000, SCIELAB, iCID's lightness contrast, and lightness structure (running track, scissor images) maps in the three prints of the three images in the MS captures than in the SS ones. The differences were relatively similar in both set-up captures by iCID's lightness difference and lightness structure maps in the speed sign image.

Printer Mode Set
In both set-up captures, iCID's lightness difference map detected more differences on the surface of the elevated parts and on the background in both Alto and Brila mode prints in the three images. iCID's lightness contrast and lightness structure maps found more differences on the surface of the elevated parts in both prints in the flower and snowflake images. CIEDE2000 and SCIELAB detected more color differences on the edges in the Alto mode print and on the surface of the elevated part in the Brila mode print in the flower image ( Figure 12). This is expected because the Brila mode introduces color shift. In addition, the Alto mode has higher elevation compared to the Brila mode. Therefore, more color differences are located on the edges in the Alto mode print. CIEDE2000, SCIELAB, and iCID's lightness structure map found more differences in the three images' both prints in the MS captures than in the SS ones.  To conclude, the quality maps of the tested IQMs are informative regarding 2.5D prints' QA. More specifically, they are informative in a similar way regarding difference detection on areas such as edges, elevated parts' surfaces, and background. For future work, it will be interesting to investigate quality maps along with height maps.
All differences detected by the tested IQMs (mainly on the edges, on the surface of the elevated parts, and on the background) for the examined set of images are consistent with the observers' feedback (mainly on elevation and aspects introduced by elevation (e.g., shadow)). This is better visible in the MS captures. As a result, this shows that the mentioned IQMs can be used for QA of 2.5D prints for specific attributes/features with a specific capture set-up.

Which Set-Up Is More Appropriate for 2.5D Prints Capture Based on IQMs' Performance?
We compare the correlation between (mean) quality maps of the tested IQMs from the MS and the SS captures to find which tested IQMs have more differences between the two set-ups and/or are more imperceptive to the set-up. For example, Figure 13 shows a boxplot of the correlation between the MS and the SS of height set images (i.e., for nine prints). Although the tested IQMs' median correlations between the two set-ups are relatively high (roughly around 0.7 for the given IQMs), the ranges of the correlation values are spread. In other words, the correlations of some IQMs are more variable than others. In particular, iCID's lightness contrast map has more prints which have lower correlations than the median value between the two set-ups. The tested IQMs seem to be more towards being imperceptive to the set-up in this set of images, meaning that they detect (more) differences (although the scale might vary between the set-ups) on the same areas irrespective of the set-up. This is in line with our observations in the previous subsections. The impact of elevation on the appearance can be better revealed in the MS captures. For example, Figures 4 and 11 show that the differences are better detected by the IQMs in the MS captures compared to the SS ones, where the latter depends on the illumination direction while the former can provide better error capture due to the mean of several illumination angles. In other words, the SS's light source is on top inside the light booth cabinet and shadowing will be more on one side than in the other side regardless of how diffuse the light is inside the light booth. This is a clear limitation of the SS with imperfect diffuse illumination inside the light booth. In contrast, the MS can provide diffuse illumination distributed evenly via capturing prints with a directional set of illumination angles (even generating mean quality maps from set of illumination angles can provide the similar effect). More information on the selected IQMs' ability to better detect differences in the MS captures than in the SS ones can be found in Section 4.1. Moreover, the tested IQMs can find more differences mostly in the MS captures in comparison to captures from the SS (refer to Section 4.2). SSIM and MS-SSIM (both terms, scale 2) detected differences that are perceptually difficult to see mostly in the MS captures as opposed to the SS ones.
Thus, they can in combination with the MS be useful in applications which need to detect such differences.
To summarize, both set-ups can be useful for 2.5D prints' QA. Nevertheless, there is a dominance by the MS based on our observations in terms of the tested IQMs' ability to capture more differences and that the captured differences are clearly visible by the examined IQMs' mean quality maps in the MS captures. Moreover, the MS (i.e., RTI) is relatively simple which needs a camera, a sample of interest, and a light source (which is movable to illuminate the sample from different illumination angles) [43].

Limitations
Because our primary goal is to focus on the IQMs and the QA side, there could be limitations on the data capture side in terms of camera calibration and accurate color acquisition. We have color checker captures with the 2.5D prints. Thus, there is an option to estimate the transformation from them to the 2D reference images and work with the transformed images, which can be considered for future work. Another limitation might be due to non-use of camera raw images. The raw images store unprocessed sensor data and are mostly used in science [57]. Thus, this can be considered for future work as well. Moreover, we worked with a small set of images, which limits quantitative analysis. It would be better to have other attributes included apart from the attributes that have been changed in the used dataset as well as other types of content that could be useful to generalize findings.

Conclusions and Future Works
The QA of 2.5D prints is currently attracting the attention of researchers, but objective QA of 2.5D prints is less studied. Therefore, an attempt to test existing 2D IQMs to predict the quality of 2.5D prints captured by the multiple-shot and single-shot set-ups was made. We acquired the following observations: • iCID's lightness difference, lightness contrast, and lightness structure maps, CIEDE2000, and SCIELAB can find differences between 2.5D prints as well as between 2.5D prints and their 2D reference images in both set-up captures; • More differences are detected mostly in the multiple-shot set-up captures than in the single-shot set-up ones and the captured differences are clearer visible in the multipleshot set-up captures by iCID's lightness difference, lightness contrast, and lightness structure maps, CIEDE2000, and SCIELAB based on their (mean) quality maps; • To create a metric for 2.5D prints' perceptual QA, it important to have a model of the HVS.
In conclusion, iCID's lightness difference, lightness contrast, and lightness structure maps, CIEDE2000, and SCIELAB were found to be relevant to use to detect differences on the edges, on the surface of the elevated parts, and on the background between 2.5D prints and their 2D reference images as well as between 2.5D prints, especially in the multiple-shot set-up captures. Our results on the responsiveness of the selected 2D IQMs on the quality variations of 2.5D prints can be useful in, for example a quality assurance application, where the industry is interested to detect defects or any other differences between prints (or with respect to the reference image/print) without necessarily defining which prints' quality is the best. We consider testing more IQMs and to apply the CSF in the preprocessing stage before applying the IQMs to account for the distance from the prints to the observer as a future work.