Enhancing Image-Based Multiscale Heritage Recording with Near-Infrared Data

: Passive sensors, operating in the visible (VIS) spectrum, have widely been used towards the trans-disciplinary documentation, understanding, and protection of tangible cultural heritage (CH). Although, many heritage science ﬁelds beneﬁt signiﬁcantly from additional information that can be acquired in the near-infrared (NIR) spectrum. NIR imagery, captured for heritage applications, has been mostly investigated with two-dimensional (2D) approaches or by 2D-to-three-dimensional (3D) integrations following complicated techniques, including expensive imaging sensors and setups. The availability of high-resolution digital modiﬁed cameras and software implementations of Structure-from-Motion (SfM) and Multiple-View-Stereo (MVS) algorithms, has made the production of models with spectral textures more feasible than ever. In this research, a short review of image-based 3D modeling with NIR data is attempted. The authors aim to investigate the use of near-infrared imagery from relatively low-cost modiﬁed sensors for heritage digitization, alongside the usefulness of spectral textures produced, oriented towards heritage science. Therefore, thorough experimentation and assessment with di ﬀ erent software are conducted and presented, utilizing NIR imagery and SfM / MVS methods. Dense 3D point clouds and textured meshes have been produced and evaluated for their metric validity and radiometric quality, comparing to results produced from VIS imagery. The datasets employed come from heritage assets of di ﬀ erent dimensions, from an archaeological site to a medium-sized artwork, to evaluate implementation on di ﬀ erent levels of accuracy and speciﬁcations of texture resolution.


Introduction
Close-Range Photogrammetry (CRP) and Technical Photography (TP) constitute two digital-recording techniques that have been widely used in the framework of the integrated documentation and study of tangible CH. The capacity of CRP to digitize three-dimensional (3D) geometrical features, providing accurate representations of the visible surfaces, along with its versatility, makes feasible the interdisciplinary analyses of CH. This technique can provide valuable textural information for the examination of the historical surfaces' characteristics [1]. It can also produce data necessary to document archaeological sites of different proportions and geometries [2][3][4][5], support excavation activities [6,7], plan the conservation interventions [8,9], and in general, create reference models and systems to assist the three-dimensional integration of various multi-sensor diagnostical data [10,11]. Furthermore, the latest algorithmic developments in the domain of metric exploitation of digital images have enabled increased automation, processing velocities, accuracy, and precision [12][13][14]. Thus, facilitating the implementation of software for the straightforward production of dense point clouds, models, and other digital reconstruction metric derivatives. TP includes a wide range of techniques applicable to historic art examination [15]. Specifically, near-infrared (NIR) imaging has been implemented to enhance archaeological observation [16], to determine the state of conservation of buildings [17], to inspect mural paintings [18], to assist the identification of pigments [19], to investigate underdrawings of panel paintings [20], underprintings [21], and palimpsests [22], to examine rock art [23], and to study feature characteristics of painted artifacts [24]. Applications of integrated heritage CRP and TP can be found in recent bibliography, showcasing a promising combination that should be further evaluated.
Past approaches regarding the integration of metric heritage modeling and information in the near-infrared spectrum have primarily concentrated on separate data acquisitions. Some methods explored for the production of models enhanced with NIR texture through two-dimensional (2D)-to-3D registration are (1) mathematical transformation of the spectral images using corresponding points [25], (2) mutual information methods, as with utilization of depth maps [26] or silhouette maps, reflection maps, and other illumination-based renderings [27], and (3) registration based on known sensor position, as can be performed with optical tracking of the cameras implemented for spectral acquisition [28]. However, the implementation of these approaches is often expensive due to the multiple sensors used and time-consuming due to the frequent need to develop application-specific algorithms.
The recent introduction of consumer-grade digital cameras modified for full-spectrum or single wavelength acquisition to heritage science has provided a less expensive, higher-resolution alternative [29,30]. This solution has spectral imaging capabilities while retaining user-friendly features and interfaces to a wide range of photographic accessories and image processing software. In combination with the automated or semi-automated photogrammetric software implementing Structure-from-Motion (SfM) and Multiple-View-Stereo (MVS) algorithms, which are becoming extremely popular for heritage applications, it can make feasible metric heritage modeling in near-infrared [31][32][33].
The current study focuses on briefly reviewing the modeling of multiscale tangible heritage from NIR imagery with state-of-the-art algorithm SfM/MVS implementations. The presented research has a dual aim, metric and radiometric. The metric aspect of the research refers to the assessment of geometric results that can be achieved with the hybrid CRP and TP approach. The metric evaluations are performed by comparing to the classic CRP approach (using visible spectrum digital images) and to scanning results whenever available. The radiometric aspect concerns the evaluation of NIR textured 3D results, on their capacity to be further exploited towards archaeological or diagnostic observations. The second section of the paper presents the different cultural heritage case studies, equipment, and methods of the various conducted experimentations. We give special attention to the characteristics of sensors involved and the capturing and processing parameters. It should be highlighted that in order to increase the comparability of the metric results, we attempted to maintain most parameters of the spectral imaging and photogrammetric reconstruction as constants. The third section focuses on the results of image-based 3D modeling and accuracy. The fourth section is devoted to exploring the use of acquired results towards the possible enhancing of archaeological and diagnostic observations. The final section discusses some concluding remarks and future perspectives.

Case Studies
The first case study (dataset 1) is the ruins of Vassilika settlement, part of the archaeological site of ancient Kymissala, located about 70 km south-west of Rhodes city in Greece. Kymissala is one of the most important archaeological sites in rural Rhodes, as indicated by the extensive visible ruins scattered in various places, dating from the Mycenaean period to Late Antiquity (see Figure 1a).
The second case study (dataset 2, see Figure 1b) refers to a part of the inner courtyard brick walls of the Center for Conservation and Restoration "La Venaria Reale" (owned by the Consorzio delle

Datasets
Dataset 1 from the archaeological site of ancient Kymissala was acquired during the Erasmus Intensive Program HERICT 2013, an international Summer School for the documentation support of the archaeological excavation in Vassilika settlement in Rhodes [34]. This settlement lies within the wider archaeological site and is the ruins of an organized urban network covering an area of approximately 200 × 250 m 2 with some 10 m of height differences. For this study, we used only the data captured with a Swinglet fixed wing Unmanned Aircraft System (UAS) by Sensefly. Two 12 Mega-Pixel (MP) camera sensors were used (Table 1); a Canon Ixus 220HS compact camera (sensor size 6.14 × 4.55 mm 2 , pixel size 1.55 µm) for VIS images and a Canon PowerShot ELPH300HS compact camera (sensor size 6.14 × 4.55 mm 2 , pixel size 1.55 µm) for NIR acquisition. The latter was modified by removing the infrared cut filter and placing an internal NIR-only filter [35]. The mission planning with both sensors had been automated for the application using e-motion software, for four flights of approximately 90 m height. Ground control points had been measured using the Global Navigation Satellite System (GNSS) and the Real-Time Kinematic method (RTK) with an accuracy of 2-3 cm. They had been signalized with a 20 × 20 cm 2 black and white checkerboard pattern and distributed in order to cover the entire area in the most effective way possible ( Figure 2). We decided to process the data from each of the four flight scenarios separately.
The datasets for the rest of the case studies (Figure 1b-d) were acquired with a 17.9 MP Canon Rebel SL1 digital single-lens reflex camera (DSLR) camera (sensor size 22.30 × 14.90 mm 2 , pixel size 4.38 µm) converted by 'Life Pixel Infrared' for full-spectrum acquisition by removing the IR cut filter. For the VIS and NIR acquisitions, two different external filters were utilized. For the interior case studies, we used flash and a tripod. interior case studies, we used flash and a tripod.

132
A standard capturing workflow was followed to acquire rigid imagery datasets ( Figure 3   A standard capturing workflow was followed to acquire rigid imagery datasets ( Figure 3) with large overlaps for image-based 3D reconstruction. It was additionally attempted to maintain capturing conditions (focal length, aperture, exposure, camera positions) and ground sample distances (GSDs) constant between VIS and NIR spectra, for every case study. Referencing for the brick wall case study was realized with a set of 18 pre-signalized control and check points, measured with a total station theodolite (TST) GeoMax Zoom30 3", producing results with an accuracy of 4-5 mm at the x-axis, 2-3 mm at the y-axis, and 5-6 mm at the z-axis. For the panels and furniture case studies, scaling was performed with an invar scale bar of 1.000165 m (±13 nm). The characteristics of all the datasets are summarized in Table 1.

127
The datasets for the rest of the case studies (Figure 1b, Table 1.

Processing Software and Hardware
Photogrammetric processing was conducted through two SfM/MVS-based commercial software. Agisoft Metashape Pro 1.5.1, which uses a scale-invariant feature transform (SIFT)-like algorithm to detect and describe features, a greedy algorithm to find approximate camera locations, and a Global bundle-adjustment matching algorithm to refine them. It employs a form of MVS disparity calculation for dense reconstruction and Screened Poisson surface reconstruction for meshing. 3DFlow Zephyr Aerial 4.519 implements a modified Difference-of-Gaussian (DoG) detector and a combination of Approximate Nearest Neighbor Searching, M-estimator SAmple Consensus, and Geometric Robust Information Criterion for matching, then, performs hierarchical SfM and Incremental bundle-adjustment. The dense MVS reconstruction is achieved with fast visibility integration and tight disparity bounding. The triangulated irregular network construction by an edge-preserving algorithmic approach was selected to differentiate from Agisoft Metashape Professional. All processing was performed with a SANTECH laptop, with a 6-core Intel i7-8750H CPU at 2.2 GHz (Max 4.1 GHz), 32 GB RAM, and NVIDIA GeForce RTX 2070 GPU.
To effectively evaluate the performance of implemented software and the effects of 3D image-based modeling on different spectra, similar parameters, when applicable, were selected for the 3D reconstruction workflows, as summarized in Table 2. The parameters were selected after experimentation to optimize the final 3D-textured results. They were selected in order to maximize preserved surface details, while not producing results of unnecessarily high density, meaning duplicate points-considering each GSD, thus slowing down the processing steps.
The geometric comparisons between vertices of final models were made by measuring the Hausdorff distances in Cloud Compare software. No local model was used for calculating these distances. We should underline that for the processing of datasets 1 and 4, only specific areas of the dense clouds were selected before the 3D mesh reconstruction step to better showcase the image-based modeling results on areas of higher interest for archaeological/archaeometric observation. Specifically, for the archaeological site of Kymissala, an area of approximately 230 × 180 m 2 and for the wooden furniture part painted with flowers, an area of 60 × 60 cm 2 was selected. Consequently, computational steps and results of meshing and texturing refer to those areas only.

Results
For dataset 1, full reconstructions were produced for scenarios 1 and 2 (see supplementary file PDF-S1). For the other flight scenarios, due to some existing irregularities of flight conditions and due to smaller overlaps, specific areas were not depicted in two pictures, at least, and therefore, not reconstructed ( Figure 4). Scenarios of similar flight altitudes produced similar photogrammetric results with Metashape Professional, in terms of cloud densities, preservation of surface detail on meshes, quality of textures, and required processing times (Table 3). Reconstruction with NIR imagery produced half the root mean square (RMS) Errors on control and check points for the lower flight scenarios but the same levels of RMS Errors for the higher altitude scenarios (Table 4). Figure 5 shows the texturing results achieved.
We should underline that for the processing of datasets 1 and 4, only specific areas of the dense 170 clouds were selected before the 3D mesh reconstruction step to better showcase the image-based 171 modeling results on areas of higher interest for archaeological/archaeometric observation.

172
Specifically, for the archaeological site of Kymissala, an area of approximately 230 × 180 m 2 and for 173 the wooden furniture part painted with flowers, an area of 60 × 60 cm 2 was selected. Consequently, 174 computational steps and results of meshing and texturing refer to those areas only.

176
For dataset 1, full reconstructions were produced for scenarios 1 and 2 (see supplementary file due to smaller overlaps, specific areas were not depicted in two pictures, at least, and therefore, not reconstructed ( Figure 4). Scenarios of similar flight altitudes produced similar photogrammetric 180 results with Metashape Professional, in terms of cloud densities, preservation of surface detail on 181 meshes, quality of textures, and required processing times (Table 3). Reconstruction with NIR 182 imagery produced half the root mean square (RMS) Errors on control and check points for the lower 183 flight scenarios but the same levels of RMS Errors for the higher altitude scenarios (Table 4).         With Zephyr Aerial, the scene of the archaeological site was not fully reconstructed in any of the acquisition scenarios, producing sparser point clouds (Table 5) with many discontinuities. As a result, in this specific case, it was decided not to continue with the mesh reconstruction phase because fragmentary results would be produced in the area of interest. Image-based 3D reconstruction for dataset 2 produced very dense modeling results of high-fidelity surface detail (see supplementary file PDF-S2). Overall, Metashape Pro produced denser results with less processing time required (Table 6), but with close examination, it was found that all four processing combinations provided similar 3D detail preservation, notwithstanding that the Zephyr Aerial NIR model had a small number of holes on its upper part.
Metashape Pro also resulted in smaller reprojection errors and measured points' RMS Errors for both VIS and NIR imagery. Additionally, the two different software resulted in the same level of errors between VIS and NIR imagery processing. Metashape Pro produced control and check RMS Errors of about 1-1.5 mm and Zephyr Aerial of approximately 2.5-3 mm. For dataset 2, geometric comparisons between the VIS and NIR model from Metashape Pro showed differences of 0.9 mm mean and 0.4 mm RMS and for Zephyr Aerial, of 1.2 mm mean and 0.2 mm RMS. Differences between the two NIR models were below 2 GSDs. Additionally, differences measured between VIS and NIR imagery had 2.6 mm mean, and 1.1 mm RMS distances for Metashape Pro, the same magnitude of variation that was measured between the VIS model and a mesh produced by a Leica BLK 360 scanner point cloud. Also, similar 2.5 mm mean and 1.0 mm RMS distances were present for Zephyr Aerial, comparing to the same laser scanning 3D point cloud after performing down-sampling. Figure 6 showcases the NIR texturing results achieved with both software.   From dataset 3, Zephyr Aerial was not able to reconstruct the scene, neither for the VIS nor for the NIR scenario. Textured meshes produced with Metashape Professional from the two different spectra were of similar density and quality (Table 7). Although, the VIS mesh contained a small amount of noise compared to the NIR (Figure 7), which can be mainly identified through the calculation of geometric differences between them since the same level of detail was preserved on both ( Figure 8). The Hausdorff distances calculated between the two models were 0.4 mm mean and 0.3 mm RMS, for an inspected area of approximately 45 × 75 cm 2 . Image-based reconstruction for dataset 4 produced very high-density results with Metashape Pro, compared to Zephyr Aerial, and overall, performed better with the non-VIS imagery, since Zephyr Aerial produced sparser, noisier, and less complete results in longer processing times (Table 8). For better visualization purposes, part of the mesh and texture results are shown in Figure 9. The geometric differences calculated between VIS and NIR 3D models were 0.5 mm mean and 0.7 mm RMS for Metashape Professional, and 1.0 mm mean and 1.0 RMS for Zephyr Aerial, while Hausdorff distances between the two VIS reconstructions with different algorithmic approaches were 0.9 mm mean and 0.8 mm RMS, and between NIR reconstructions, 1.0 mean and 1.1 RMS for a 0.1 mm sampling distance of original images. Again, VIS mesh contained a small amount of noise compared to the NIR, identifiable through the calculation of geometric differences between them, since the same level of detail was preserved on both. The distances between all 3D models and the mesh produced from a Stonex F6 Short Range structured light scanner were in the range of 1.0 ± 1.0 mm.

Discussion
Near-infrared modeling of the archaeological site of Kymissala resulted in slight enhancement of the archaeological observation, without giving any significant insight compared to the visible 3D documentation ( Figure 10). Although, using the dense NIR reconstruction results, we were able to construct a fine approximate of the digital terrain model. It should be mentioned that the digital terrain models were constructed by removing canopy only by color filtering. For the VIS and NIR dense point clouds, we used color values at the same coordinates, corresponding to higher, lower, and shadowed vegetation, to classify and then erase vegetation (maintaining constant tolerance values). As showcased in Figure 11, the terrain model produced by NIR imagery is almost noiseless, facilitating the identification of the archaeological remains. Therefore, we could claim that for this case study, NIR modeling made the separation of the canopy easier to create a more accurate terrain model (see supplementary files TIF-S3 and TIF-S4).     For the case study of the brick walls, near-infrared modeling made it possible to perform a rough identification of the areas of bio-deterioration on the surfaces, since these decayed areas have a different response at the NIR spectrum than healthy materials. On the lower areas of the NIR model (Figure 12), decay is easily identifiable and can be discriminated from areas of high-moisture content, which also appear dark on the VIS model. The results were verified by in-situ inspections.
For the case study of the detail from the Chinese four-panel Coromandel Screen, NIR modeling assisted the identification of retouched and defected areas (Figure 13), which appear darker than the uncolored lacquerware background surface. Additionally, NIR modeling helped the production of a noiseless surface 3D model, as described above.  a different response at the NIR spectrum than healthy materials. On the lower areas of the NIR 268 model (Figure 12), decay is easily identifiable and can be discriminated from areas of high-moisture 269 content, which also appear dark on the VIS model. The results were verified by in-situ inspections.

280
Near infrared modeling of the wooden furniture part painted with flowers helped to better 281 identify defects and restored areas ( Figure 14). On the NIR model, we were able to observe Near infrared modeling of the wooden furniture part painted with flowers helped to better identify defects and restored areas ( Figure 14). On the NIR model, we were able to observe undersurface characteristics such as previous restorations, which were performed by filling with new materials and by repainting, cracks, and small deteriorated areas. Those characteristics could not otherwise be detected by visual inspection and VIS modeling only.

289
This paper suggested how the use of near infrared imagery from modified consumer DSLR 290 cameras can be used to enhance the geometry and texture of 3D heritage models at different scales,

Conclusions
This paper suggested how the use of near infrared imagery from modified consumer DSLR cameras can be used to enhance the geometry and texture of 3D heritage models at different scales, using image-based modeling software. Considering GSDs and the precision of each methodology, modeling with NIR datasets produced very accurate results, compared to ones produced with VIS datasets. Furthermore, for the very-large-scale case studies, direct modeling from dense NIR imagery resulted in high-resolution noiseless models, compensating for the glaring problems in visible imagery caused by lighting conditions and highly reflective materials. For all CRP reconstructions, both in the visible and in the near-infrared spectrum, Agisoft Metashape performed better than 3DFlow Zephyr. Reconstructions produced by 3DFlow Zephyr were overall sparser, noisier, and had discontinuities. Although, the algorithmic implementations used were intentionally vastly different and some of them were not suitable for every case study. Additionally, we should mention that Agisoft Metashape showed excellent noise-filtering capabilities. Also, in cases of very-large-scale applications, of millimetric or sub-millimetric requirements, 3DFlow Zephyr can perform better, producing a meshing result closer to the raw reconstruction results, with the edge-preserving algorithm, which does not interpolate any data. On the contrary, 3DFlow Zephyr is not recommended for areas of large dimensions as it usually malfunctions. Furthermore, the use of NIR imagery did not seem to have a significant impact on the processing durations or reconstruction errors on any of the employed commercial software.
The reconstructed high-resolution near-infrared textures helped enhance archaeological observation and evaluation of the state of preservation, depending on the heritage case study. For the case study of the archaeological site, NIR modeling facilitated the classification of the canopy to create an approximate of the digital terrain model. For the rest of the case studies, it provided valuable conservation-oriented information, regarding the surface and subsurface characteristics of the historical materials. It should be mentioned that unlike the other spectral imaging techniques, this approach cannot replace the laboratory characterizations when detailed information is required about the characteristics of materials and decay products on historical architecture or artifacts. Although, it can be used as the first diagnostical step to identify areas of higher interest and plan sampling.
To conclude, the 3D modeling approach showcased here proved to be a simple and flexible alternative to previously implemented methodologies for the NIR enhancement of heritage models. This approach can potentially benefit the rapid diagnostics and conservation of multiscale tangible heritage, ranging from small artifacts to historical architecture. Thus, our future research will focus on the implementation of near-infrared and thermal 3D modeling for identification, mapping, and quantification of materials and their deterioration on heritage assets.
Supplementary Materials: The following are available at http://www.mdpi.com/2220-9964/9/4/269/s1. PDF-S1: 3D pdf file of the near-infrared textured model from the archaeological site of Ancient Kymissala, produced with Agisoft Metashape Professional, decimated to 1,000,000 triangles and textured with a single-file 4096 × 4096 pixel 2 texture. PDF-S2: 3D pdf file of near-infrared textured model from an inner courtyard brick wall of the Center for Conservation and Restoration "La Venaria Reale", produced with Agisoft Metashape Professional, decimated to 1,000,000 triangles and textured with a single-file 4096 × 4096 pixel 2 texture. TIF-S3: TIF file of the digital terrain model from the archaeological site of Ancient Kymissala, produced with Agisoft Metashape Professional from visible UAV imagery, with a GSD of 5.5 cm. TIF-S4: TIF file of the digital terrain model from the archaeological site of Ancient Kymissala, produced with Agisoft Metashape Professional from near-infrared UAV imagery, with a GSD of 5.5 cm.