1. Introduction
Extended Reality (XR) is a term encompassing all current and future real and virtual combined environments such as VR (Virtual Reality), AR (Augmented Reality) and MR (Mixed Reality). In the last 5 years, vast advancements have been taking place in the field of immersive media [
1], both in terms of production and consumption. In terms of production, computing and especially graphical processing technology has gone through three generations of evolution, now having features such as real-time ray tracing in both consumer GPUs (NVIDIA RTX) as well as game consoles (ray tracing-capable APUs by AMD in Microsoft Xbox X and Sony PlayStation 5). In terms of video capture, RED, Kandao, Insta360, Vuze and more have released cameras that can capture the world in 3D-360. In terms of media consumption, companies such as Oculus, HTC, Valve, HP and more have introduced affordable VR headsets capable of displaying immersive content. Efforts for AR and MR have also been massive from all the leading companies such as Apple’s ARKit [
2], Google’s ARCore [
3] and Holo/MR efforts from Microsoft with their Hololens systems [
4]. Apple’s more recent release of the Apple Vision Pro device has also promised to push boundaries, focusing on spatial computing and offering industry-leading resolution and fidelity.
XR can be used for immersive 3D visualisation in geoinformation and geological sciences, where virtual locations can be based on geospatial datasets [
5]. Such virtual geosites can be used for popularising geoheritage for a general audience as well as engaging younger demographics, usually interested in more cutting-edge forms of communication [
6]. Another advantages of XR on geoheritage sites is the ability to visit locations around the clock, regardless of weather conditions, or observe features that are difficult to access; for example, a fossilised tree trunk might be too tall, necessitating the use of scaffolding to access up close, a potential health and safety risk for general public observation. Multiple observers on the same location/artefact are also an added advantage, making it possible for a variety of audiences to observe a specific artefact up close at the same time, including conservation professionals.
UASs refer to unmanned aerial systems. UASs have been rapidly advancing, with companies such as DJI continually updating their model offerings with drones aimed at both professionals and hobbyists, ranging from portable foldable models such as the Spark, the Mavic and the Mini series, to larger and more versatile drones with changeable payloads such as the Matrice series. Drones offer us rather versatile options in terms of optical viewpoints, practically allowing us to position cameras or other scanning equipment in areas that are not as easily reachable by terrestrial means. They are also efficient when it comes to photographing, mapping or otherwise gathering sensor data of wide areas. UAS surveys have also been used in archaeology, using a combination of LiDAR scans and photogrammetry techniques, to help make observations of historical areas, interpret locations and make new discoveries that may not be visible to the naked eye such as other possible structures at the same location [
7]. Geospatially aware datasets ensure accurate placement, aiding both in the reconstruction of an XR environment as well as navigation within it.
When it comes to cameras, there have also been developments in both software and hardware domains, enabling us to capture higher resolutions with greater fidelity as well as different formats and all possible fields of view. One of those developments is the 360° camera, where fisheye optics are used in conjunction with multiple sensors, resulting in real-time panoramas [
8], previously requiring extra stitching work in order to be produced and also being less suitable to capture visual material when there are moving elements in the target area. Multiple sensor fisheye arrangements also make it possible to obtain 3D-360 stereoscopic results for use in immersive media.
Another major development in digital cameras has been in the onboard processing aspect, with newer technologies over the years allowing us to acquire images with less sensor noise and also enabling more pixels to be included in smaller sensors [
9]. The advent of the smartphone also helped speed up development in that area since every smartphone user was in fact a digital camera user, the target audience no longer being focused mainly on photography professionals and enthusiasts. That, combined with fierce competition in that sector, has brought us vast camera improvements every year, with fidelity rapidly improving and, in certain cases, resulting in smartphones rivalling professional digital cameras in terms of fidelity, as that sector has been evolving faster. Machine learning and artificial intelligence have also been integral to mobile phone chipsets, further aiding processes such as better detail extraction during the process of taking pictures with the integrated camera, practically resulting in Machine Learning Computational Photography. This exciting development essentially democratises high-fidelity photography, a field once exclusive to high-end equipment. Material utilising such methods will also be collected and investigated, comparing results in the software.
When it comes to digitization of subjects in 3D, photogrammetry is a versatile method, and it relies on material captured with cameras. In order to create believable immersive content to be deployed in XR, the physical objects need to be captured with as much detail as our current technology allows us to capture. In order to aid better immersion, both the visual and audio domains will be included, the audio being in the form of 360° spatial audio (ambisonic).
XR applications have been implemented in geoheritage sites via a variety of methods. Both Augmented Reality (AR) and Virtual Reality (VR) techniques have been used in order to digitally represent geoheritage sites and artefacts; however, the process tends to be focused more on the transmissibility of information and the preservation of sites and artefacts in digital form, with less emphasis on factors such as immersion and realistic approximation of the actual artefacts and sites [
1]. While the areas and objects can be accurately digitised and represented from a size, dimension and geolocation perspective, they can lack important details such as high-resolution meshes and textures and appropriate photorealistic shading, even completely ignoring the audio aspect, all of which are important to completely represent the environment of a location when experienced in XR.
This paper investigates and uses a variety of the aforementioned means as multisensory media techniques in order to create convincing XR representations of fossilised tree trunks. A clear problem with XR and photogrammetry has been lower fidelity visuals with dense geometry, resulting in low performances in VR and XR applications due to resource-hungry outputs. This research addresses those points by investigating ML-/Computational Photography-assisted imaging, achieving visually superior results using image sets of lower pixel resolutions. Visual material from a variety of sources, aerial and terrestrial, is compared and appropriate processes are applied to achieve more realistic detail and therefore more immersive results when deployed in XR. The innovation of this study is to implement computational photography-aided imagery, derived from mobile phones, in the 3D modelling and visualisation of geoheritage sites. Such imagery results in superior fidelity, suitable for extra detailed representations when it comes to 3D digitisation of petrified tree trunks. Additionally, the resulting 3D model is fused with the model derived from scale-accurate RTK UAS imagery, as well as 360 panoramic imagery, resulting in a comprehensive model that includes both the surrounding environment area as well as the extra high-fidelity tree trunk. The extra fidelity derived from our methodology allows us to produce a more realistic visual result, suitable for extra immersive XR experiences.
4. Discussion and Conclusions
The aim of this research was to investigate and use a variety of immersive multisensory media techniques in order to create convincing digital models of fossilised tree trunks for use in XR. Immersion and realism were key focus points in the early stages, in order to be able to approximate the digitally reconstructed output as close to the real artefact as possible. In order to do that, extra factors were also included such as capturing the spatial audio of the area using ambisonic microphones as well as that of the surrounding environment using multi-sensor 3D-360° camera equipment.
Throughout this research, both common and experimental methods were used, challenging familiar techniques with potentially improved new alternatives. The familiar method included image sets taken with more commonplace methods using conventional (flat) photography with normal lenses and sensors [
17,
18]. A slightly different approach was also capturing one additional image set using the camera of a Xiaomi Mi11 Lite 5G mobile phone since it features an impressive 64-megapixel main camera within its specifications sheet. The new alternatives were 360 cameras using multiple sensors and fisheye ultra-wide-field-of-view sensors. At times, the alternative method produced disappointing outputs, with the content produced using the 360° cameras resulting in inferior results as the resulting geometry lacked precision and had both distortions and rather large gaps. Panorama-based photogrammetry has not been available for as long as conventional flat-imagery photogrammetry, so it is expected to further improve as the technology further matures.
Being an avid photographer and cinematographer in my spare time, I have become familiar with the advantages of computational photography in the last several years, driven by machine learning, and have repeatedly noticed a smaller and cheaper device, like an iPhone, challenging my professional equipment in terms of fidelity when it comes to outputs straight from the device. Based on that, I hypothesised how beneficial it would be to use such technology to capture the source image sets for photogrammetry capture; therefore, I used an iPhone 11 Pro mobile for one extra set and the results were exceedingly impressive.
While there is always some basic computational process involved in digital cameras [
9] in terms of converting the sensor data to an image file, the advent of smart camera phones has made such processes even more commonplace. The rapid evolution of such camera phones essentially brought a good quality camera into most peoples’ pockets and the included ‘app stores’ made it much easier and more accessible to alter the way the camera module works, compared to altering the software of a dedicated digital camera. Since certain picture qualities such as a shallow depth of field and low light/low noise were normally the characteristics of cameras with high-quality large sensors and optics, mobile phones had to find software solutions in order to calculate and realistically recreate such characteristics formerly reserved for professional cameras.
The one recent development I focused my interest on in relation to this research was Deep Fusion by Apple since the claimed advantage was added details using multiple shots and machine learning to determine the areas of interest. Impressively, when put to the test, the additional dataset from the 12-megapixel iPhone 11 Pro rivalled all my previous results, when used for photogrammetry, in terms of fidelity. It was rather surprising to see the Zenmuse P1 professional UAS camera, using a full-frame (35 mm+) sensor boasting 45 megapixels of resolution, ending up with less detail fidelity compared to a small (10 mm−) sensor with 12 megapixels of resolution, not to mention the stark contrast compared to the results produced by the otherwise colossal 64-megapixel content shot with the Xiaomi mobile phone. The advantages of the Machine Learning Computational Photography approach were rather obvious in the results, to the point of convincing me to carry out all future photogrammetry work with it from now on.
Naturally, not all photogrammetry tasks are possible with a mobile phone, based on area and size requirements. This is also partly true with the fossilised tree trunk being rather tall and normally not possible to reach with a handheld device. In that case, this was solved by using rather long monopods; however, when collecting visual material from rather large structures, it would be impractical or even impossible to use or construct monopods to match large heights. Seeing what is being photographed can also be an issue since these devices use their displays as a preview screen; this was also resolved during this project by using proprietary solutions to live preview and control the mobile phone with a smartwatch.
Camera technology is constantly evolving, especially digital cameras that rely on sensors and internal processing for results. While computational photography-assisted devices are not as widely available outside of the smartphone domain, it is likely that technology will find its way to becoming embedded in all digital camera equipment in the near future, including cameras such as the Zenmuse P1 used with the UAS for this research. Until such a development appears in readily available products, I am already building a custom mount where I can attach computational photography-capable mobile phones to a UAS as well as signal-repeating equipment for remote controlling and previewing purposes.
More modern technologies on photogrammetry shall also be used in the future, for further comparisons and experimentation. Some initial tests have already been made with the 3D Capture tool within the recently released beta of Adobe Substance 3D Sampler [
19], with surprisingly good results for accuracy (
Figure 17) at a fraction of the processing time compared to Agisoft Metashape.
Advanced Physically Based Rendering (PBR) materials are also being considered for future use, aiming for even more realism and flexibility. Moreover, in order to be further compatible with a wider range of systems, retopology techniques are also considered for the future, essentially reconstructing the mesh with lower polygon equivalents while simulating the depth information using clever PBR materials. Advanced real-time geometry processes such as Nanite of UnrealEngine 5 are also considered for the future, enabling the use of really high polygon counts while making them workable in lower spec hardware.
The findings of this research reveal the advantages of recent developments in mobile imaging technology to essentially democratise processes to enable higher-end fidelity results for smaller organisations or enthusiasts with limited budgets as well as further optimise the use of resources of more established institutions and anyone involved in XR industry applications. Observing superior results from a sensor a fraction of the size of typical full-frame photography sensors and a quarter of the resolution in comparison means that more people can effectively engage with such processes, once requiring larger budgets allocated for such an endeavour, usually limited to either academic/scientific or otherwise industrial organisations. The fact that mobile imaging is being improved every single year due to fierce competition in that field also adds extra potential for even higher quality results in the near future, compared to a less competitive field when it comes to more conventional cameras and imaging hardware.
Before experimenting with such techniques, a more traditional workflow was used, as in obtaining a budget and ordering the ‘industry standard’ equipment for aerial imaging, which was the best imager the UAS manufacturer could offer, something that was both more costly and also more resource intensive since the images were 48 megapixels each, compared to the phone’s 12 megapixels. Common sense would dictate that bigger and ‘established’ techniques would warrant better results; however, as shown in our results, that proved not to be the case as the more sophisticated approach using images from the smartphone delivered visibly superior outputs, resulting in significantly more detailed photogrammetry outputs. Considering how rapidly smartphones are evolving, the future of digital imaging sure seems exciting.