Two New Ways of Documenting Miniature Incisions Using a Combination of Image-Based Modelling and Reﬂectance Transformation Imaging

: Digital 3D documentation methods such as Image-Based Modelling (IBM) and laser scanning have become increasingly popular for the recording of entire archaeological sites and landscapes, excavations and single ﬁnds during the last decade. However, they have not been applied in any signiﬁcant degree to miniature incisions such as gra ﬃ ti. In the same period, Reﬂectance Transformation Imaging (RTI) has become one of the most popular methods used to record and visualize this kind of heritage, though it lacks the beneﬁts of 3D documentation. The aim of this paper is to introduce two new ways of combining IBM and RTI, and to assess these di ﬀ erent techniques in relation to factors such as usability, time-e ﬃ ciency, cost-e ﬃ ciency and accuracy. A secondary aim is to examine the inﬂuence of two di ﬀ erent 3D processing software packages on these factors: The widely used MetaShape (MS) and a more expensive option, RealityCapture (RC). The article shows that there is currently no recording technique that is optimal regarding all four aforementioned factors, and the way to record and produce results must be chosen based on a prioritization of these. However, we argue that the techniques combining RTI and IBM might be the overall best ways to record miniature incisions. One of these combinations is time-e ﬃ cient and relatively cost-e ﬃ cient, and the results have high usability even though the 3D models generated have low accuracy. The other combination has low time- and cost-e ﬃ ciency but generates the most detailed 3D models of the techniques tested. In addition to cost-e ﬃ ciency, the main di ﬀ erence between the 3D software packages tested is that RC is much faster than MS. The accuracy assessment remains inconclusive; while RC generally produces more detailed 3D models than MS, there are also areas of these models where RC creates more noise than MS.


Introduction
Miniature incisions such as old graffiti are often located in places exposed to natural and/or human made wear and tear. Surveys and studies have shown that these vulnerable traces of history can be a rich source of information [1-3], but the traditional ways of recording this heritage-such as frottaging, tracing, casting and raking light photography-are often time-consuming, inaccurate and in some cases destructive [4]. During the last decade, Reflective Transformation Imaging (RTI) has become a widely utilized method to record surfaces with miniature incisions [5][6][7][8]. This low-cost and time-efficient method generates highly detailed 2.5D models (see Section 2.2.2), which may be manipulated to showcase the surface elevations even further. However, this method lacks many of the benefits of 3D modelling, such as complex ways of analyzing the results and the options to

The Documented Surfaces
The two studied surfaces are on two 13th century building stones, both of which derive from the Nidaros Cathedral in Trondheim, Norway.
Surface A (Figure 1) is located on a building stone which was originally part of a wall pillar in the cathedral. The documented side measures c. 52 by 20 cm, and has incisions with thicknesses ranging from c. 0.5 to 5 mm. The graffiti consists of crosses and other symbols. The crosses are orientated in differing directions, indicating either that the surface originally faced upwards and was accessible from different angles, or that it was later reused in a different position. The stone is presently in a storage facility, placed close to the floor with the recorded side in a vertical position. The closeness to the floor made it impossible to take pictures or illuminate the surface from low angles, but this do not seem to have influenced the results noticeably.
Surface B (Figure 2) is a c. 30 by 24 cm part of a building stone still in its original position in one of Nidaros Cathedral's outer walls ( Figure 2). The recorded surface bears, among other graffiti, a runic inscription interpreted as "Lavrans owns". The inscription, combined with a large horizontal groove in the same wall, suggests that an altar mentioned in a 15th century documentary source was located here (Lavrans is an indigenous and medieval form of Laurentius). The runic inscription, situated c. 40 cm above and to the left of the groove in the wall, could have been an unofficial dedication inscription for commoners who were not fluent in Latin.

Methods
The methods used to record the chosen surfaces are Image-Based Modelling, two variations of Reflectance Transformation Imaging, and two ways of combining IBM and RTI (Combo 1 and 2). These will be further described below. All 3D models were made or attempted made using two 3D processing software packages, MetaShape and RealityCapture. The specifications and settings of hardware used are listed in Appendix A, while the specifications and settings of software used are listed in Appendix B. The equipment used and their costs are listed in Appendix C. All source images, as well as all figures, are available in full resolution as Supplementary Materials online.

Image-Based Modelling
IBM is a technique based on the concept of creating 3D models from overlapping pictures with known or assumed orientations. This method is often incorrectly referred to as Structure from Motion (SfM) photogrammetry, but as shown below this is technically only a part of the process [20]. Most 3D processing software packages follow this basic procedure [20,21]: Alignment-The positions of the camera are calculated for each picture based on homologous points where the pictures overlap.

2.
3D point cloud generation-the matching points are used to create a 3D sparse point cloud consisting of X-, Y-and Z-data by using SfM. Then, dense image matching algorithms are used to densify the point cloud.

3.
3D mesh reconstruction-The relevant points in the dense point cloud are connected by triangles, creating a 3D mesh.

4.
Texture projection-The 3D mesh is textured by projecting colors from the source images.
Image acquisition for the IBM models were done with a camera mounted on a tripod. To record surface A, pictures were taken from a wide variety of angles, at c. 0.7 -1 meter. Great care was taken to get every picture as sharp as possible, so the whole session lasted over an hour. Surface B was photographed at c. 30 cm and angled straight onto the surface, which allowed for using the same camera settings throughout the photo session.

Reflectance Transformation Imaging
RTI is a method of capturing and enhancing elevation information on a surface. During image acquisition the camera remains static, but the illumination is different for each picture. RTI processing software, most commonly RTIBuilder, is used to make 2.5D models from the images, i.e., 2D graphical projections that uses height information to create the illusion of 3D viewing surfaces or artefacts. Unlike 3D models, the user is restricted to seeing the models from the singular angle from which they were photographed [19]. To separate these from other types of 2.5D models, such as Digital Terrain Models (DTM) and Digital Elevation Models (DEM), they will be referred to here as RTI models. Another RTI software, most commonly RTIViewer, is used to view and manipulate the models by changing light direction and/or rendering mode. The rendering modes display and enhance the surface elevations of the RTI models in various ways. RTI is an improvement of the traditional raking light method of photographing graffiti, i.e., image acquisition with illumination from one sharp angle. While this technique illuminates one side of the surface it often obstructs others [6]. More importantly, the models also contain estimated information regarding the shapes and textures of the recorded surfaces.
There are two main techniques of RTI image acquisition, Highlight RTI (H-RTI) and Rigged Light RTI (RL-RTI). When using H-RTI, a light source is moved for each picture taken and the angle of the light source is captured in the reflection of one or more spheres situated somewhere within the images [22]. The pictures should be taken with lighting from an even spread of angles and with the same distance to the surface. This distance should be about two to four times the length of the documented surface [23]. RTIBuilder uses the reflections in the spheres to calculate the angles the pictures were taken from. An optional RTI kit, including reflective spheres in different sizes and equipment to rig these for photo sessions is available from www.culturalheritageimaging.org. Since the method is based on manipulating lights and shadows the pictures are normally taken in dark or dim conditions. Alternatively, neutral density filters in combination with a flash may be used to achieve better lightning conditions [6].
With the RL-RTI technique, numerous light sources are rigged in optimal angles and distances, typically in an RTI dome. The domes have the additional advantage of blocking out unwanted light, and some domes may be mounted on tripods to take pictures vertically. The rigged light systems have information about the light sources' distances and angles stored in a light positioning-file used by the RTI software, making the reflecting spheres unnecessary. This also speeds up the post-processing, since the software does not have to calculate this information for each new session. In this study, a 35 cm diameter RTI dome with 50 fixed LED lights was used. The RTI dome was produced by Tomasz Lojewski at the laboratory at AGH University of Science and Technology, in Krakow, Poland [24]. The camera was connected to the dome, and the photo sessions were set up and initiated by the light regulator and shutter control unit on the dome. During each image acquisition, 50 pictures were taken automatically with a different LED-light illuminating the surface for each picture. The post-processing of the pictures taken with the RL-RTI technique were done using RTI Processor, a software specialized for images acquired this way. Although there are alternative RL-RTI setups, the term will, from here on, refer to images acquired by using an RTI dome.

IBM and RTI Combined
Two ways of combining IBM and RTI were tested, both based on overlapping images acquired using the RL-RTI technique. The images for both combination methods were acquired by photographing the surfaces with the RTI dome, taking 50 pictures from each of the 28 (Surface A) and 25 (Surface B) different positions. The 50 pictures from each position had 50 different illuminations. This allowed for a large number of pictures with different illumination, and the question was whether this would benefit the resulting models. A 28 mm lens was chosen to cover a large area for each photo session, while sacrificing some of the accuracy a narrower angle of lens would have provided. The first way to combine these methods (Combo 1) was to process these images unaltered using IBM software. The second way (Combo 2) was to process the images from each camera position separately using RTI software before further processing with IBM software. To do this, these steps were taken:

1.
Each of the 28 and 25 sets of 50 pictures of Surface A and B were processed into 28 and 25 RTI models using RTI Processor.

2.
Two JPGs of each RTI models were exported in RTIViewer, using two of the different rendering modes available in this software; Normals Visualization and Dynamic Multi Light. This step will be elaborated below.

3.
The 56 and 50 JPG exports were processed by IBM software.
It was discovered that most of the Polynomial Texture Maps (PTMs) available in RTIViewer distort the end results so much that IBM software is not able to recognize them as overlapping images. The Normals Visualization mode is ideal, because it substitutes all the original color information with a color representation of each pixel's surface orientation [23]. To allow the IBM software to be able to recognize the overlapping pictures the false colorings were removed in Photoshop. A disadvantage of this method is that the color coding used by the software is so finely tuned that even slight slopes on the recorded surfaces will often result in these parts being colored so light or dark that the details disappear. To complement these areas, another rendering mode was also used for export-Dynamic Multi Light. These pictures retained their original colors, but the software had combined illumination from many different angles to optimize their sharpness and brightness [23]. Both 3D software packages had problems aligning the Normals Visualization exports, so the Dynamic Multi Light exports were aligned first, then their positions were exported and reused for the rest of the pictures.

Usability
All the tested methods have different constraints and advantages that influence their usability's. For instance, IBM has the advantage of being well suited for recording surfaces of most shapes, but to do this ample space is often needed to take pictures from all necessary angles. This is not necessary for recording flat surfaces, however, as demonstrated by the recording of Surface B which was photographed from many positions but only one angle (straight on). Another constraint to IBM is that to record as many details as possible the surfaces should ideally be well-lit and evenly illuminated [25]. It is often difficult to achieve this, especially when recording large surfaces. In contrast, H-RTI demands near darkness to avoid ambient light degrading the result, if no neutral density filters are used. This complicates the photo session, when one has to carefully maneuver around the camera and the tripod with a remote shutter, a light source and a rule or a wire to maintain a constant distance between the light source and the surface. Even a slight deviation in the setup requires the process to be started over again. Since the distance from which the surface is lit should be about two to four times the width of the surface, the size of the recorded surface will often be limited by the surroundings. RL-RTI image acquisition excludes ambient light and brings its own illumination. The RTI dome also has less need for workspace than H-RTI, because the tripod only has to move slightly between each round of photos and nothing else has to be moved. On the other hand, the dome forces the camera to be perpendicular to the surface, which increases the need for even surfaces. The distance from the camera to the surface is so small that even small deviations will cause areas to become out of focus. With our dome, the distance between the lens and the recorded surface was 18 cm, so height deviations of c. +-1 cm on the recorded surface would not be properly recorded. For Combo 1 this problem may be mitigated to some extent by using a shorter focal length or expanding the focus and decreasing shutter speed. However, Combo 2, H-RTI and RL-RTI use RTI software which works only for recording flat or nearly flat surfaces. If the lights and shadows used by the software to estimate surface elevations are obstructed by large deviations the result will be severely negatively affected [5,24].
RL-RTI restricts the size of the recorded surface even more than H-RTI, because only the area inside the scope of the dome (or even just a part of it, depending on the lens used) will be recorded. The RTI software is not able to combine and manipulate RTI models simultaneously, which means that RTI models of overlapping parts of a surface have to be manipulated and exported separately to be presented together. Often the most interesting rendering modes in RTIViewer are the ones where the original colors are or may be removed and the elevations on the surfaces are most pronounced: Specular Enhancement, Normal Unsharp Masking and the Normals Visualization. With the exception of the Normals Visualization-which has its own previously mentioned limitations-these modes fade out the outer areas of the RTI models, because these areas were not photographed with light from enough angles to have a well estimated topography. That means that there has to be a high degree of overlap between each dome position to be able to export these results for further IBM processing. The need for overlap is not as severe for Combo 1, which uses the original pictures for IBM.
The resulting models also have different usability. The RTI models may be illuminated from all angles and manipulated to showcase elevation differences, but there is not much else they can be used for. On the other hand, 3D models have a wide range of uses. They may be rotated and shown from different angles, and easily shared for interactivity online. Furthermore, they can be scaled and georeferenced, which make them suitable for measurements and other analysis. They can also be easily combined or compared to other 3D models.

Time-Efficiency
Time spent on the tested methods varied considerably, both on-site (Table 1) and in post-processing ( Table 2). Generally speaking, the RTI methods were more time-efficient than the IBM method. IBM image acquisition of miniature incisions on flat walls may be sped up somewhat by taking all the pictures from the same angle, as was done when recording Surface B and not from many different angles as was done with Surface A. During H-RTI image acquisition only the lighting source is moved for each picture taken, and Surface A was recorded in less than 15 minutes. The RL-RTI image acquisition of the same surface lasted 7.5 times longer, because the process needs many overlapping photo positions when recording surfaces larger than the RTI dome-especially when the result is also intended for IBM post-processing, as these images were.  The post-processing of H-RTI pictures was also a quite time-efficient procedure. In theory, the RL-RTI post-processing is much faster because RTI Processor omits many of the calculations needed for the H-RTI technique, since the position of the camera and the illumination is known in advance. However, the large number of pictures caused post-processing the RL-RTI pictures to last much longer than the H-RTI process. The time-efficiency of post-processing with the Combo 1 and Combo 2 methods varied greatly. When using Combo 1 all the original pictures were post-processed using IBM software, while only 1/50 the number of pictures had to go through this process when using Combo 2. When post-processing 1400 and 1250 pictures for IBM (Combo 1), RealityCapture worked slowly and MetaShape was stopped after the initial alignment steps lasted over 15 hours. However, when pre-processed by RTI Processor before these results were exported and subsequently processed by the IBM software (Combo 2), the lower number of pictures made the post-processing much more efficient-for instance, the total post-processing of Surface A lasted 3 hours and 34 minutes using Combo 2, while it lasted 18 hours and 46 minutes when using Combo 1. The lower number of pictures also allowed MetaShape to be used for Combo 2, in this case the total post-processing of Surface A lasted 11 h and 12 min.
The time-efficiency difference between the 3D software packages was obvious, even though as mentioned previously MetaShape downscaled the dense cloud before mesh generation while RealityCapture did not. In the four instances both programs were used, RealityCapture was 11, 11, 14 and 28 times faster than MetaShape.

Cost-Efficiency
Both IBM and RTI are commonly seen as low-cost documentation methods in comparison to methods such as laser scanning and structured light scanning [5,10]. There is still quite a price difference between the tested methods, and especially for smaller institutions price often matters. The RTI methods cost less than the other three (Table 3). Since the necessary software is free, the relative cost of using RTI depends on what kind of camera equipment is used, whether to invest in a RL-RTI setup, and if so, which one. Larger domes would cover larger surfaces and may be more advanced but could also be substantially more expensive. The cost of using IBM also depends on the camera equipment used, but as seen in Table 3 the chosen software package was the most decisive factor in our case. Of the two programs tested, RealityCapture is four times more expensive than MetaShape Professional. However, both software packages are available with cheaper licensing options, and there are multiple other alternatives to these programs at different price ranges which may be considered.

Accuracy
Evaluating the accuracy of these models, i.e., the level of detail is, in contrast to time-efficiency and cost efficiency, not adequate when based on quantifiable variables, because IBM generates 3D models, and RTI generates 2.5D RTI models. The amount of detail in a 3D model can be inferred by, for instance, the number of vertices; from the numbers in Table 4 we can assume that RealityCapture has generated more detailed models than MetaShape. This is because MetaShape, unlike RealityCapture downscales the point clouds before generating meshes [26]. This constraint may be bypassed by setting a Custom Face Count when creating a mesh in MetaShape. This option was unintentionally overlooked in this instance and will be subject for further analysis. All models were made with the highest default settings available, see Appendix B for details. Since the RTI models do not contain vertices, assessment of accuracy also needs to be carried out visually. For this article, visual comparisons were done in two ways. Selected details of all the 3D models were exported as OBJ-files, without the photographic texture maps in order to avoid the visual interference these creates. The models were scaled and aligned in CloudCompare, and an image of each model was then rendered with identical illumination in Blender. The corresponding details of the RTI models were exported in the Normals Visualization mode (described later) and converted to greyscale images with Photoshop. Since the 3D models in their entirety were too large for Blender, they were visualized with the Radiance Scaling shader in MeshLab. As can be seen in Figures 3 and 4, the RTI models (Figures 3c and 4b) are clearly more detailed than the pure IBM models (Figure 3a). Of the 3D models, only the Combo 1 models (Figure 3b) come close to the accuracy of the RTI models. When compared to the "blueprints" of the RTI models, the 3d models are less detailed and too smooth -many of the minor elevations and crevices in the weathered surfaces do not appear there. The accuracy differences are even more obvious on inspection of the close-ups (Figures 5 and 6). For instance, the ring encircling the arms in the ringcross on Surface A is very clear in the RTI models (Figure 5f,g) and the Combo 1 model (Figure 5c), only partly visible in the IBM models (Figure 5a,b), and even less visible in the Combo 2 models (Figure 5d,e). In general, the Combo 1 models are the most detailed and have the least amount of noise of the 3D models, which arguably make them the most visually appealing (see for instance Figure 6c in comparison to the other 3D models in the same figure). The models made using Combo 2 are very pixelated and are the least detailed ones (Figure 5d,e and Figure 6d,e).    All the 3D models made with RealityCapture were slightly more detailed and less pixelated than those made by MetaShape. On the other hand, all the models made with RealityCapture had some areas that were less detailed or more noisy than in the corresponding areas of the models made by MetaShape (see for instance the IBM-RC model in Figure 6b versus the IBM-MS model in Figure 6a). Another interesting result was that MetaShape was able to align an area of Surface A which was photographed with little overlap for Combo 2, while RealityCapture could not.

Discussion
IBM and RTI are handy tools for recording and analyzing surfaces with miniature elevation differences; for example, recording incised surfaces, monitoring the deterioration old buildings over time, and examining different layers of paint on a painting. However, their potential is restricted as the methods are mostly used separately [10]. This study has highlighted different approaches to using IBM and/or RTI. Although the study tested a limited amount of equipment and software on only two surfaces, the results indicate that no single method is perfect for all purposes (Table 5). If usability is the most important factor, IBM has an advantage as it is the only one able to record surfaces of most shapes and sizes. For example, if one wanted to record incisions on a very curved sculpture, methods utilizing RTI image acquisition and/or RTI post-processing would not work; the curvatures would obstruct the lights and shadows too much for either proper recording or model generation. On the other hand, the sculpture could easily be recorded by using IBM, provided adequate illumination and image acquisition workspace were available. Furthermore, the methods generating 3D models (IBM, Combo 1 and Combo 2) have high result usability as 3D models can be used for many different purposes. However, methods using only 3D software for post-processing (IBM and Combo 1) have low time-efficiency, especially when using MetaShape. Using RealityCapture will speed up the post-processing substantially, even more so when processing many pictures. MetaShape is the least expensive option, but if time-efficiency is more important than cost-efficiency, RealityCapture is recommended. The differences regarding the accuracy of the 3D models made by the two software packages are minor. While the level of detail is generally higher in the RealityCapture models, some examples have less detailed and more noisy parts than the corresponding parts of the MetaShape models.
If time-efficiency, cost-efficiency and accuracy are the most important factors, the pure RTI methods are better options. The software is free, and the RTI post-processing results in very accurate RTI models which may be manipulated in various ways to enhance the details even further. Both image acquisition and post-processing are highly time-efficient, especially when using a less cost-efficient RTI dome (RL-RTI).
There is a strong case for combining IBM and RTI, however. While not time-efficient or cost-efficient, combining RL-RTI image acquisition with IBM post-processing (Combo 1) generates the most accurate 3D models of the techniques tested here. Why is this? Could the reason be just that the pictures taken for Combo 1 was taken closer and with a better lens? For instance, as seen in Table 4, the GSD value of the IBM model of Surface A is 0.1 mm, while the Combo 2 model has a GSD value of 0.07 mm. To examine this, a well-lit picture from each of the 28 camera positions taken of Surface A for Combo 1 was post-processed as IBM-RC; if the changing illumination provided by the RL-RTI setup did not contribute to the added detail quality, this test model should be just as detailed as the Combo 1 model. As seen in Figure 5, the test model (Figure 7b) is clearly less accurate than the Combo 1 model (Figure 7a), even though the pictures they are based on have the same GSD values.
Another suggestion to explain the added accuracy of the Combo 1 technique was that the camera might have moved slightly when taking the images and that this movement in effect added up to a lot more than 28 camera positions. This explanation was also dismissed, as close inspection revealed that there had been virtually no camera movement. To conclude, the added accuracy of the Combo 1 models must be attributed to the IBM software being able to detect more details and avoid data acquisition error because of the RL-RTI setup.
RL-RTI image acquisition with RTI post-processing before IBM post-processing (Combo 2) also has some merits. Firstly, the process efficiently generates both RTI and 3D models, allowing high result usability. Secondly, the low number of pictures after the original pictures are "distilled" by the RTI software can easily be processed by cheaper software packages such as MetaShape. Thirdly, by using Combo 2, surface texture maps may be generated instead of photographic texture maps; i.e., texture maps based on the Normals Visualization exports from the RTI models instead of the original photographic color maps (Figure 8). Neither the photographic texture maps (Figure 8a,c), nor the 3D models themselves (Figures 3-6) exhibit the level of details shown by the surface textures (Figure 8b,d). All surfaces recorded by both IBM and RTI may have the RTI result projected onto the 3D models afterwards, but only Combo 2 has the option of producing surface texture during the 3D model generation. The surface texture could be said to be misleading since the Combo 2 3D models are less accurate than the other IBM models. So, the question to be asked is, which type of accuracy matters most: The 3D model accuracy or the texture accuracy? If the top priority is for the 3D models to be analyzed as 3D surfaces, then 3D model quality is clearly important. If the main goal of the documentation is to analyze or disseminate the incisions as cultural heritage, the amount of details one can see might be the most important. A bonus is that the original pictures post-processed using Combo 2 may also be post-processed using Combo 1 in cases where the added quality of the 3D models matters.

Conclusions
Two surfaces bearing miniature incisions were recorded by IBM and RTI, as well as two novel ways of combining these methods. The aim of this article was to assess these methods in terms of their usability, time-efficiency, cost-efficiency and accuracy. Furthermore, all 3D models were generated by using two 3D processing software packages, MetaShape and RealityCapture, to examine their influence on the reviewed factors. The following conclusions can be drawn from this study: 1.
None of the methods tested were optimal in all regards. Consequently, the choice of method must be based upon a prioritization of the reviewed factors.

2.
The two techniques that combine IBM and RTI gave positive results. Combo 2 (RTI image acquisition by an RTI dome, combined with post-processing using RTI software first and IBM software subsequently) appears to be the overall best method based on the reviewed factors. The process generates both RTI and 3D models and does this efficiently in terms of both time and cost. Although the resulting 3D models are less accurate than those produced by the other methods, the models may be fitted with highly detailed surface texture maps derived from the RTI processing. The numerous original pictures may also be post-processed directly by using more expensive IBM software (Combo 1). Although this method is not cost-or time-efficient, it generates the most accurate 3D models of the methods tested. 3.
The more expensive, RealityCapture, was a lot more time-efficient than MetaShape. However, in cases where the same pictures could be processed by both software packages, the accuracy of the 3D models generated showed minor differences. The evaluation of the IBM software packages ability to generate high detail quality remains inconclusive.
Some shortcomings in our research should be pointed out. Because of time and availability constraints, not all methods were tested on all surfaces. In addition, the MetaShape models were not attempted made with Custom Face Count settings, which presumably would have added more details but possibly also more noise. In general, more variations of on-site and off-site setup should be tested on more surfaces to examine how this will affect the reviewed factors.