A Review of Mobile Mapping Systems: From Sensors to Applications

The evolution of mobile mapping systems (MMSs) has gained more attention in the past few decades. MMSs have been widely used to provide valuable assets in different applications. This has been facilitated by the wide availability of low-cost sensors, advances in computational resources, the maturity of mapping algorithms, and the need for accurate and on-demand geographic information system (GIS) data and digital maps. Many MMSs combine hybrid sensors to provide a more informative, robust, and stable solution by complementing each other. In this paper, we presented a comprehensive review of the modern MMSs by focusing on: (1) the types of sensors and platforms, discussing their capabilities and limitations and providing a comprehensive overview of recent MMS technologies available in the market; (2) highlighting the general workflow to process MMS data; (3) identifying different use cases of mobile mapping technology by reviewing some of the common applications; and (4) presenting a discussion on the benefits and challenges and sharing our views on potential research directions.


Introduction
The need for regularly updated and accurate geospatial data has grown exponentially in the last decades.The geospatial data serves as an important source for various applications, including but not limited to indoor and outdoor 3D modeling, generation of geographic information system (GIS) data, disaster response high-definition (HD) maps, and autonomous vehicles.Such data collection has become possible by the continuous advances in mobile mapping systems (MMSs).MMS refers to an integrated system of mapping sensors mounted on a moving platform to provide the positioning of the platform while collecting geospatial data [1].A typical MMS platform takes light detection and ranging (LiDAR) and/or high-resolution cameras as the primary sensors for acquiring data for objects/areas of interest, integrated with sensor suites for positioning and georeferencing, such as the global navigation satellite system (GNSS) and inertial measurement unit (IMU)).To perform accurate geo-referencing, traditional mobile mapping approaches require extensive post-processing, such as strip adjustment of point clouds scans, or bundle adjustment (BA) of images using ground control points (GCPs), during which manual operations may be required to clean noisy data and unsynchronized observations.Recent trends of MMSs aim to perform direct georeferencing that leverages the capabilities of a multi-sensor platform [2,3], to minimize the human interventions during the data collection and processing.The automation has been further strengthened to use machine learning/artificial intelligence to perform online/offline object extraction and mappings, such as traffic lights and road sign extraction [4][5][6].
Mobile mapping technology has experienced significant developments in the past few decades with algorithmic advances in photogrammetry, computer vision, and robotics [7].In addition, the increased processing power and storage capacity further facilitate the collection speed and data volumes [8].The applications and systems are further strengthened by the availability and a diverse set of low-cost survey sensors with various specifications, making mobile mapping more flexible to acquire data in complex environments (e.g., tunnels, caves, and enclosed spaces), with lower cost and fewer labors [9].Typically, commercial MMSs can be classified based on their hosting platforms into handheld, backpack, trolley, and vehicle-based.Some platforms are designed to work indoors only without relying on GNSS, while others can work indoors and outdoors.Mobile mapping technology has gained more attention when it was adopted by companies such as Google and Apple [10,11] to be used for various applications, including navigation, virtual and augmented reality [12].Figure 1.An example of an MMS: a vehicle-mounted mobile mapping platform consisting of different positioning and data collection sensors to generate an accurate georeferenced 3D map of the environment.We show the main sensors of the Leica Pegasus: Two Ultimate as an example.Photo courtesy of Leica Geosystems [13].
An example of a vehicle-based MMS (Leica Pegasus: Two Ultimate [13]) and its typical sensor suites are shown in Figure 1.It consists of both data acquisition sensors and positioning sensors.The data acquisition sensors primarily consist of a calibrated LiDAR and digital camera suite, where the LiDAR sensor is capable of producing 1,000,000 points/second and the camera suite captures 360 horizontal field of view (FoV) to provide both the texture/color information and the stereo measurement should this be needed.The positioning sensors include a GNSS receiver that provides the global positional information, with an additional IMU and distance measuring instrument (DMI) that obtain odometry information for integrated position correction.These positioning systems are required to be calibrated in their relative positions and play a vital role to generate globally consistent point clouds.
Despite the existence of a few mobile mapping technologies in the market, the technology landscape of MMS is highly disparate, as there is no single and standard MMS that is widely used in the mapping community.Most of the existing MMSs are customized using different sensor suites at different grades of integration, thus each has its pros and cons.Previous studies mostly focused on comparative studies among certain devices [1,[14][15][16][17][18][19] or targeted systems for specific application scenarios (e.g., indoors or outdoors) [20][21][22].Due to the rapid development of imaging, LiDAR, positioning sensors, and onboard computers, the updated capability of these essential components of an MMS, may not be fully reflected through a few integrated systems, thus making these studies less informative.There is generally a lack of study that covers the comprehensive landscape of sensor suites and the respective MMSs.In this paper, we focus on providing a meta-review of sensors and platforms tasked for ground-level 3D mappings, as well as the techniques needed to integrate these sensors as suites for MMS in different application scenarios.This review intends to provide an update on sensors, MMSs with different hosting platforms, and extended applications of MMSs, to not only serve researchers in the field of mobile mapping with updated background information but also for practitioners on critical factors of concern when planning to customize an MMS for specific applications.We will highlight the main steps from data acquisition to refinement, and discuss some of the most common challenges and considerations of MMS.

Paper Scope and Organization
This paper provides a comprehensive review of the MMS technology, where we provide a thorough discussion covering mobile mapping from the sensors, and software to their applications.We thoroughly discuss the types of the different sensors, their practical capabilities, and limitations, as well as methods of fusing these sensory data.We then describe the main platforms that are used in the mapping tasks of different application scenarios (e.g., indoor and outdoor applications).In addition, we discuss the main stages of processing MMS data including preprocessing, calibration, and refinement.In order to assess the benefits of an MMS in practice, we examine a few of the most important applications that widely use mobile mapping technology.Finally, for the benefit of future works and directions, we highlight the main considerations and challenges in an open discussion.
The rest of this paper is organized as follows: Section 3 provides a detailed review of essential positioning and data collection sensors in MMSs; Section 4 presents the different MMS hosting platforms based on their application scenarios (i.e., vehicle-mounted systems, handheld, wearable, and trolley-based); Section 5 presents the workflow to process MMS data from acquisition to algorithms for fusing their observations and refinement; Section 6 introduces the enabled applications using MMS on mapping and beyond.Finally, Section 7 concludes this review and discusses the future trends.

An Overview of Sensors in Mobile Mapping Systems
The positioning and data collection sensors are two classes of essential components of a typical MMS, as depicted in Figure 1.The positioning sensors are used to obtain the geographical positions and motion of the sensors, which are used to geo-reference the collected 3D data.Examples of these sensors include GNSS, IMU, DMI (i.e., odometers), etc.To achieve statistically more accurate positioning, the measurements from these sensors are usually jointly used through fusion.Addition fusion can be also performed between the position and the navigation cameras.The sensor fusion solution for positioning is nowadays standard, as neither the GNSS receiver nor the IMU/DMI alone, can provide sufficiently accurate and reliable measurements for navigating mobile platforms.GNSS measurements are usually subject to the signal strength variation in different environments, for example, we can obtain a strong signal in open spaces and weak or no signals in tunnels or indoors, which can lead to a loss of information.On the other hand, the IMU and DMI are subject to a significant accumulation of errors and thus are often used as supplemental observations for navigation when GPS data are available.
The data collection sensors mostly consist of LiDAR and digital cameras, providing raw 3D/2D measurements of the surrounding environments.The 3D measurements of an MMS mostly rely on LiDAR sensors, while the images are mostly used to provide the colorimetric/spectral information [20].With the development of advanced dense image matching methods [23][24][25][26][27], these images are also collected stereoscopically to provide additional dense measurements for 3D data fusion.In the following subsections, we provide an overview of the positioning and data collection sensors, as well as the respective sensor fusion approaches.

Positioning Sensors
As mentioned above, typical positioning sensors include GNSS receiver, IMU, and DMI.Their patterns of errors are complementary to each other, thus in modern MMS, they are mostly used through a sensor fusion solution to provide accurate positioning information up to centimeter-level [28].Nevertheless, their individual measurement accuracies are still decisive to the accuracy of the resulting 3D maps.An overview of the positioning sensors is shown in Table 1.In the following subsections, we will discuss the three main positioning sensors GNSS, IMU, and DMI.where GPS is unstable.

Requires calibration
and provides only distance information (1 degree of freedom).

Global Navigation Satellite System Receiver
The GNSS receiver is a primary source to estimate the absolute position, velocity, and elevation in open areas referenced to a global coordinate system (e.g., WGS84).It passively receives signals from a minimum of four different navigational satellite systems and performs trilateration to calculate its real-time positions.Since it depends on an external source of signal, the GNSS often exhibits less or no accumulation errors.These satellite systems mainly refer to the GPS developed by the United States, the GLONASS (Globalnaya Navigatsionnaya Sputnikovaya Sistema) developed by Russia, Galileo built by the European Union, and BeiDou developed by China [29].The raw observations (pseudorange, carrier phase, doppler shifts, etc.) from the chipset of the receiver with its solver often give a positional error at the meter level, depending on the chipsets and antenna (e.g., single/dual frequencies) [30].High-tier MMSs often use augmented GPS solutions, such as Differential GPS (DGPS) or Real-Time Kinematic GPS (RTK-GPS), to improve the positioning accuracy to decimeters and centimeters levels (can achieve an accuracy of 1 cm [31]).DGPS uses a code-based measure and can operate with single-frequency receivers without initialization time, while RTK-GPS uses carrier-phase measures, thus it requires dual-frequency receivers and takes about one minute to initialize (for fixing the wavenumbers) [19].Both DGPS and RTK-GPS rely on a network of reference stations to a surveyed point at its vicinity, to apply corrections to eliminate various errors such as ionosphere delays and other unmodeled errors.The traditional DGPS method achieves submeter accuracy in the horizontal position, while with much more advanced techniques and solvers, the RTK-GPS, as a type of DGPS, can achieve centimeter-level accuracy in three dimensions.However, these achievable accuracy measures are conditioned to open areas, while when collecting 3D data in dense urban areas with tall buildings or indoor environments, the GNSS signal can be largely impacted by the occlusions and the resulting measurements can be inaccurate [32], thus, it requires other complimentary sensors when operating under such conditions.In general, the positioning platform of an MMS is expected to achieve an accuracy of 5-50 mm at speed that can reach the maximum speed of highways (120-130 km/h) when considering the integration of complementary sensors.

Inertial Measurement Unit
IMU is an egocentric sensor that records the relative position of the orientation and directional acceleration of the host platform.Its positional information can be calculated through dead reckoning approaches [33,34].Unlike GNSS, it does not require links to external signal sources, and it records relative positions with respect to a reference to the starting point (usually can be dynamically provided by GNSS in open fields).Like many other egocentric navigation methods, it suffers from accumulation errors leading to oftentimes significant drifts to its true positions.To be more specific, an IMU consists of an accelerometer and gyroscope to sense the acceleration and the angular velocity.These raw measurements are fed into an onboard computing unit to apply the dead reckoning algorithm to provide real-time positioning, thus the IMU and the computing unit with the algorithm as a whole is also called an inertial navigation system (INS).The grade/quality of IMU sensors can be differentiated by the type of gyroscopes: the majority of the lightweight and consumer-grade IMUs use Microelectromechanical Systems (MEMS) that have a low-cost but poor precision with large drift error (often 10-100°/h [35]) [3,36], and higher grade systems for precise navigation use a more sized but accurate gyroscopes such as ring laser and fiber optic gyroscopes, which can reach to a drift error less than 1°/h [35].IMU can work in GPS-denied environments indoors, outdoors, and in tunnels, while given its nature of dead reckoning navigation, the measurements will only be accurate for a relatively short period in reference to the starting point.Since GNSS provides reasonable accuracy in an open area and measures do not have error accumulations as the platform moves, it often integrates IMU as additional observations, which, as a standard approach, provides more accurate positional information in complex environments mixed with both open and occluded surroundings [37].

Distance Measuring Instrument
The DMI generally refers to instruments that measure the traveled distance of the platform.In many cases, DMI is alternatively referred to as the odometer or wheel sensors for MMS based on vehicles or bikes.It computes the distance based on the number of cycles the wheel rotates.Since DMI only measures distance, it is often used as supplementary information to GNSS/IMU as an effective means to reduce the accumulated errors and constrain the drift from IMU in GPS-denied environments such as tunnels [38].It requires calibration before use, which measures distance, velocity, and acceleration.

Sensors for Data Collection
Data collection sensors are another major component in the MMS to collect 3D data.They typically refer to sensors such as LiDAR and high-resolution cameras that provide both geometry and texture information.They require constant georeferencing using the position and orientation information provided by the positioning sensors to link the 3D data to the world coordinate system.In this section, we introduce the LiDAR and imaging systems (i.e., cameras), where we describe their functions, types, benefits, challenges, and limitations, and provide an example of a system that represents the status quo.

Light Detection and Ranging (LiDAR)
LiDAR, known as Light Detection and Ranging, is an optical instrument that uses directional laser beams to measure the distances and locations of objects.It provides individual and accurate point measurement on a 3D object, thus many of these measurements together constitute information about the shape and surface characteristics of objects in the scene.It has many wanted features in the 3D model, as it is highly accurate, can acquire dense 3D information in a short time, invariance to illumination, and can partially penetrate sparse objects like a canopy.LiDAR itself is still an instrument measuring relative locations, thus it requires a suite of a highly accurate and well-calibrated navigation system to retrieve global 3D points, the installation and cost of which, in addition to the already expensive LiDAR sensor, make it still a high-cost collection means.
The concept of using light beams for distance measurements has been used since 1930 [39].Since the invention of the laser in 1960, LiDAR technology has experienced rapid development [40] and has been very popular in tasks like accurate mapping and autonomous driving applications [41].Nowadays, there are many commercially available LiDAR sensors either for surveying or automotive applications.Typically, survey-grade LiDAR achieves a range accuracy at millimeter-level (usually 10-80 mm), such as RIEGL VQ-250, VQ-450 [42], and Trimble MX9 and MX50 [43].Relatively lower-grade LiDAR sensors (with lower cost) achieve a range accuracy at a centimeter-level (usually 1-8 cm), generally satisfying the applications for obstacle avoidance and object detection, are often used in autonomous driving platforms given its good tradeoff between cost and performance.Examples of such LiDAR sensors include Velodyne [44], Ouster [45], Luminar Technology [46], and Innoviz Technologies [47].In fact, the level of accuracy of different grades of LiDAR sensors and the cost are the main decisive factors when considering the choice of a LiDAR sensor, since there is a large gap in the cost of both grades, with survey-grade LiDARs often cost hundreds of thousands of US dollars (at the time of this publication), while the relatively lower-grade ones are about ten times cheaper.
LiDAR sensors can be categorized based on their collecting principle, into three main categories: rotating, solid-state, and flash.Rotating LiDAR uses a rotating mirror spinning for 360 degrees re-directing laser beams, and usually has multiple beams where each beam illuminates one point at a time.The rotating LiDAR is the most commonly used in MMS, since based on its nature of "rotating", it provides large FoV, high signal-to-noise ratio, and dense point clouds [48].Solid-state LiDAR usually uses MEMS mirrors embedded in a chip [49] where the mirror can be controlled to follow a specific trajectory or optical phased arrays to steer beams [50], thus it is "solid" and does not possess any moving parts in the sensor.The Flash LiDAR [51] usually illuminates the entire FoV with a wide beam in a single pulse.Analogous to a camera with a flash, a flash LiDAR uses a 2D array of photodiodes to capture the laser returns, which are finally processed to form 3D point clouds [48,52].Typically, a Flash LiDAR often has a limited range (less than 100 m) as well as a limited FoV, constrained by the sensor size.Although LiDAR is primarily used to generate point clouds, it can also be used for localization purposes through different techniques such as scan matching [53][54][55].The extractable information can be further enhanced by deep neural networks for semantic segmentation and localization [56,57].However, like many other methods, LiDAR sensors can provide relatively accurate range measurements, their performance deteriorates significantly in hazardous weather conditions such as heavy rain, snow, and fog.Table 2 shows an example of several existing LiDAR sensors along with their technical specifications in terms of the range, accuracy, number of beams, FoV, resolution, point per second, and refresh rate.Generally, the choice of sensors depends on the application and the characteristics of the moving platform (e.g., the speed, payload, etc.).As mentioned above, most MMSs rely on rotating LiDAR sensors, but they often come at a high cost as compared to other categories.Therefore, using solid-state LiDAR in an MMS is now a promising direction since its cost is lower than the rotating-based LiDAR.When the vehicle speed is high, there is less time to acquire data and hence more beams are needed to ensure that the object of interest is measured by sufficient points [44,58].For instance, a 32 beams LiDAR could be sufficient for a vehicle moving at a speed of 50-60 km/h, and a LiDAR with 128 beams is recommended for higher speeds up to 100-110 km/h so that the acquired data will have an adequate resolution.The operating range of a Li-DAR can be also important and should be considered on an application basis (e.g., longrange LiDAR may be unnecessary for indoor applications.).In general, the cost usually increases roughly by a factor of 1.5-2 when the number of beams is doubled, which is also positively correlated to the operating range.Imaging systems like cameras are one of the most popular types of sensors for data collection due to their low-cost and ability to provide high-resolution texture information.Cameras are usually mounted on the top or front of the moving platform to capture information about the surrounding environment.They are intended to acquire many images at a high frame rate i.e., 30-60 frames per second.Cameras are useful to serve a few main purposes.First, recovering the geometry of the scene, which is usually obtained through stereoscopic/binocular cameras that process a pair of overlapping images to recover the depth information using stereo-dense image matching approaches [24][25][26].Second, obtaining the textures of objects in the scene; a camera records photons of the object at different spectral frequencies that provide rich and critical information about the object's natural appearance, thus can be used to build panoramic and geotagged images, photorealistic models.Third, the texture information encodes critical semantics of the object, thus can be used to detect static objects such as traffic lights, stop signs, marks, road lanes, and detect moving objects such as pedestrians and cars, which is becoming gradually applicable as the modern deep learning methods developed to tackle such problems [4,59,60].
There are many types of camera sensors and configurations used in MMSs, depending on their intended use as described before.Examples include monocular cameras, binocular cameras, RGB-D cameras, multi-camera systems (e.g., ladybug), fisheye, etc.A summary of different camera types is shown in Table 3.The monocular cameras (low-cost cameras) provide a series of single RGB images without any additional depth information, which is often used to collect high-resolution and geotagged images or panoramas [11].However, they cannot be used to recover 3D scale or generate high accurate 3D points.The binocular cameras, on the other hand, consist of two cameras capturing synchronized stereo images to recover depth and scale with an additional computational cost through stereo-dense image matching techniques [24][25][26].The performance and accuracy of the 3D information depend on the selection of the stereo-dense image matching method.In many cases, mapping solutions may rely on RGB-D cameras (e.g., Kinect [61], Intel RealSense D435 [62]) which can provide both RGB images and depth images (through structured light) simultaneous, they are mostly used in indoor settings due to their limited range.Integrating LiDAR with RGB-D images can yield highly accurate 3D information; however, this may require pre-compensation for the uncertainties in the RGB-D images from random noise or occlusions.Due to the compact/cluttered environment, an MMS often includes a wide FoV or even 360° panoramic camera, which is usually achieved via the multi-camera system that uses a group of synchronized cameras sharing the same optical center (e.g., FLIR Ladybug5+ [63]).Panoramic images additionally facilitate the integration with LiDAR scanners, providing 100% overlap between the image and LiDAR point clouds (e.g., those from rotating LiDAR).As a result, the panoramic images are suitable for street mapping applications.As a lower-cost alternative, a fisheye camera aims to provide an image with extended FoV from a single camera, it has a spherical lens that can provide more than 180° FOV, this is although cheaper, may come with the cost of image distortions in scale, geometry, shape, and illumination, requiring additional and slightly more complex calibrations.
As the modern MMS benefits from various cameras in providing additional information, it comes with a few added complexities.First, it captures images using the reflected light off of objects, which makes it sensitive to the illumination of the environment such as the high dynamic range of the scene (between sky and ground), and hazy weather conditions [64,65].Second, cameras, and multi-camera systems, require calibration to reduce different types of distortions [66].Third, moving platforms require high framerate cameras to leverage the speed, image quality, and resolution [67].• Suitable for street mapping applications.

•
Require large storage to save images in real-time.• Lens distortions.

Mobile Mapping Systems and Platforms
There are a few factors that determine the type of sensor and platform to be used for MMS tasks.These factors include available sensors, project budget, technical solutions, processing strategies, and scene contents (i.e., indoor or outdoor).These help to determine the type of available sensors (e.g., with/without GPS) and the accessible platforms (e.g.vehicle-mounted or backpack, etc.).For example, in indoor environments there is no access to GPS signals or vehicles, thus, must adopt alternative solutions.
In general, we consider broadly categorizing the MMS platforms into traditional vehicles and non-traditional lightweight/portable mapping devices.Traditional vehiclebased MMS primarily operates on main roads collecting city or block-level 3D data.The non-traditional portable devices, such as backpack/wearable, handheld systems, or trolley-based depending on their application and task, can be used both outdoors or indoors in GPS-denied environments.For outdoor applications, these means of mapping is primarily used to complement vehicle-based system, by mapping narrow streets and areas that cannot be accessed by vehicles [68,69].For indoor or GPS-denied environments, the sensor suites may be significantly different from those used outdoors, for example, they may primarily rely on INS or visual odometry for positioning [70,71].To be more specific, in this section, we introduce four typical MMS platforms that offer mapping solutions, namely, the traditional vehicle platform, and three portable platforms, including handheld, wearable, and trolley-based systems.Further details of these systems are provided in Table 4 and the following subsections.The accuracy measurement reported by the manufacturers, the measure of the accuracy is unknown if not stated as relative or absolute.The "-" symbol indicates that the specifications were not mentioned in the product datasheet.

Vehicle-Mounted Systems
This setting refers to mounting the sensor suites on top of the vehicle to capture dense point clouds.These systems enable a high-rate data acquisition at the vehicle travel speed (20 to 70 miles/h).The sensor platform can be mounted on cars, trains, or boats depending on the mapping application.Generally, vehicle-mounted systems achieve the highest accuracy compared to other mobile mapping platforms primarily because of their size and payload, which allows them to host high-grade sensors [72].A vehicle-mounted MMS system is usually equipped with a survey-grade LiDAR that provides dense and accurate measurements, as well as a deeply integrated 360° FoV camera providing textural information.Regarding the positioning sensors, a vehicle-mounted system usually fuses measurements from GNSS receivers with IMU and DMI.An example of these systems is introduced in [73]   Vehicle-mounted systems are used for various applications such as urban 3D modeling, road asset management, and condition assessment [78,79].Moreover, these systems can be used for automated change detection in the mapped regions [80,81], creating upto-date HD maps as an asset for autonomous driving [73] and railway monitoring applications [82].
Although vehicle-mounted systems play a major role in mobile mapping, their relatively large size hinders their accessibility to many sites such as narrow alleys, and indoor environments.Additionally, some studies [83] have demonstrated that the speed of the vehicle may affect the quality of the 3D data creating doppler effects for successive scans [84].Therefore, the speed and plan of the route have to be planned ahead of the mapping mission.

Handheld and Wearable Systems
The handheld and wearable systems follow lightweight and compact designs using small-sized sensors.An operator can hold or wear the platform and walk through the area of interest.Wearable systems are often designed as a backpack system to allow the operator to collect data while walking.Both handheld and wearable systems are distinguished by their portability, which enables mapping GPS-denied environments including enclosed spaces, complex terrains, or narrow spaces that vehicles cannot access [20,85].Due to the nature of these environments, handheld and wearable systems may not rely on GNSS receivers for positioning, but instead, depend on an IMU or use a LiDAR and camera for both data collection and localization (using simultaneous localization and mapping (SLAM) approaches) [86].A sample of handheld and wearable systems are shown in As mentioned above, handheld and wearable systems are effective in mapping enclosed spaces; for instance, these devices can be used to map caves where GNSS signals and lightings are not available [91].In addition, they are used to map cultural heritage sites that may be complex and require the data to be efficiently collected from different viewing points [92][93][94].Furthermore, these systems are efficient in mapping areas that are not machine-accessible, such as forest surveying [68,95,96], safety and security maps, and building information modeling (BIM) [97].However, working in GPS-denied regions requires compensating for the lost signal, which demands setting GCPs in these regions, or utilizing GPS information before entering into such environments [98], whereas, inside the tunnels, the navigation completely depends on IMU/DMI or the scanning sensors (Li-DAR or cameras).

Trolley-Based Systems
This type of system holds similar nature to the backpack systems while remaining to be slightly more sizeable to carry a heavier payload.It is suitable for indoor and outdoor mapping where the ground is flat [99].A sample of trolley-based systems is shown in Figure 4. Examples of these systems include NavVis M6 [89], Leica ProScan [13], Trimble indoor [43], and FARO Focus Swift [100].Trolley-based systems are also suitable for a variety of applications such as tunnel inspection, measuring asphalt roughness, creating floorplans, and BIM [22].In addition, they are used for creating 3D indoor geospatial views of all kinds of infrastructure such as plant and factory facilities, residential and commercial buildings, airports, and train stations.

MMS Workflow and Processing Pipeline
There are a few processing steps turning from the raw sensory data of MMS to the final 3D products, which generally include data acquisition, sensor calibration and fusion, georeferencing, and data processing in preparation for scene understanding (shown in Figure 5.).In the following subsections, we provide an overview of these typical processing steps.

Data Acquisition
the planned route to be analyzed to determine the configured platforms and sensors to be deployed, for example, the operators should be aware of the GNSS accessible regions to plan the primary sensors to use.For MMS positioning, the GNSS, IMU, and DMI continuously measure the position and motion of the platform.In most outdoor applications, the main navigation and positioning data are provided by the GNSS satellite to the receiver, and the IMU and DMI supplements measurement where GNSS signals are insufficient or none.In some specific cases where GNSS is completely inaccessible, such as for cave mapping, GCPs will be used to reference the data to the geodetic coordinate system.3D data are mainly collected by an integrated LiDAR and camera system, where LiDAR produces accurate 3D point clouds colorized by the images from the associated camera.

Sensors Calibration and Fusion
Sensor calibration and fusion are often performed throughout the data collection cycle, the task of which is to calibrate the relative positions between multiple sensors, including these between cameras, between camera and LiDAR, or among LiDAR, camera, and navigation sensors, as well as fusing their output as a postprocessing step to achieve more accurate positional measurements.These serve the purpose of both more accurate localization, more accurate geometric reconstruction, and data alignment for fusion GNSS / IMU / DMI Calibration and Fusion

Positioning Information
LiDAR / Camera Calibration and Fusion

Georeferenced 3D Data
Collected Data [101,102].In the following subsections, we introduce a few typical calibration procedures in MMS.

Positioning Sensors Calibration and Fusion
The integration of GNSS, IMU, and DMI is split into several steps.The first is lab/factory pre-calibration, which estimates the relative offset among these sensors and their relative position to the data collecting sensors (e.g., LiDAR and cameras) [103].Second, sensor information fusion to output the estimated positions through optimal statistical/stochastic estimators [29,104].A typically used algorithm is the Kalman filter (KF) [105][106][107][108], which uses continuous measurements over time with their uncertainties and a stochastic model for each sensor to estimate the unknown variables in a recursive scheme.KF is the simplest dynamic estimator that assumes linear models and Gaussian random noise of observations; thus is often re-adopted through Extended KF (EKF) for linearized non-linear models [109,110].However, convergence is not guaranteed for EKF, especially when the random noise does not follow the Gaussian distribution.Thus, the particle filter [111,112] is usually adopted as a good alternative, as it can simulate the noises to deal with the potential non-Gaussian noise distributions.

Camera Calibration
Camera calibration refers to the process of rigorously determining the camera intrinsic parameters (i.e., focal length, principal point), various lens distortions (e.g., radial distortion), and other intrinsic distortions such as affinity or decentering errors [113,114].The camera calibration parameters often follow the standard Brown model [115], or the extended parameters [116] that additionally model in-plane errors due to the film/chip displacements.The traditional and the most rigorous camera calibration approach uses a 3D control field consisting of highly accurate 3D physical point arrays.Converging images of these point arrays are captured at various angles and positions.These well-distributed 3D points, together with their corresponding 2D observations on the image, go through a rigorous BA with additional parameters (i.e., calibration parameters).However, 3D control fields are demanding and costly, thus this approach is mostly used for calibrating survey-grade or aerial cameras at the factory level.A popular and less demanding calibration approach is called cloud-based camera calibration [116] Instead of using the very expensive 3D control fields, this approach uses coded targets, which can be arbitrarily placed (but well-distributed with certain depth variations) in a scene as a "target cloud".Converging images of these targets can yield very accurate 2D multi-ray measurements, which are fed into a free-network BA for calibrating the camera parameters.Because of its simplicity, this is often used as a good alternative in calibrating cameras for close-range applications, including herein the MMS applications.A least rigorous but very often used calibration method uses a chessboard as a target for calibration [117], which extracts regularly distributed 2D measurements from images of the chessboard and performs selfcalibrating BA.However, due to its limited scene coverage, the nature of the board being a planar thus lack of depth variation, as well as the limited flexibility to capture wellconverged images filled with features, it may not, in BA, well decorrelate the camera parameters from the exterior orientation parameters leading to potential errors in calibration.Since it is very commonly used in the computer vision community and well supported by available open-source tools, it is one of the most popular approaches to obtain quick calibrations and can be used to calibrate cameras that do not demand high surveying accuracy, such as for navigation cameras.
When calibrating a multi-camera system (e.g., a stereo rig), the camera calibration is extended to additionally estimate the accurate relative orientation among these cameras.The calibration still undergoes through a BA, while it requires at least knowing the scale of the targets (either the "target clouds" or chessboard), in order to metrically estimate the baselines between all these cameras.

LiDAR and Camera Calibration
The calibration between LiDAR and the camera covers a few aspects: first, images must be time-synchronized with the LiDAR scans.Second, the relative orientation between the LiDAR and camera rays must be computed.Third, they must have the same viewpoint to avoid parallaxes.Time-synchronization is one of the crucial calibration steps to correct time offsets between sensors.It refers to matching the recorded and measured data from different sensors with separate clocks to create well-aligned data in terms of time and position.Time-synchronization errors (or the time offsets) are due to 1) clock offset, which refers to the time difference between internal clocks of sensors, or 2) clock drift, which refers to sensors' clocks operating at different frequencies or times [118].A time offset greater than milliseconds between LiDAR and camera can cause significant positioning errors when recording an object.Additionally, the impact can be more significant and noticeable if the platform is operating at a high speed.To address this problem, sensors must have a common time reference, which is often based on GNSS's time because of its high precision and ability to record positions in nanoseconds [119].The time offset between GNSS and IMU is either neglected because of its insignificance or maybe slightly corrected using KF as a fusion method.The LiDAR and camera timestamps are corrected in real-time using the computer system on board.The computer system updates that data based on GNSS's time, some examples of these computer systems/servers include GPS service daemon (GPSD), IEEE, and Chrony.
Relative orientation refers to estimating the translation and orientation parameters between sensors.This type of calibration often needs to be carried out once in a while, due to deteriorations of the mechanics within and among the sensors after operating in different environments.Since quick calibration can be performed using a single image, where four corresponding points are selected between the image and the 3D scan, using either the well-identified natural corner points or highly reflective coded targets.
Ideally, LiDAR and Camera data must be well-aligned, however because of the wide baseline between the two sensors they often view objects from different angles leading to large parallax between them.The large parallax causes crucial distortions such as flipping or flying points, flying points, occlusions, etc.To resolve this issue, the relative orientation parameters are used to project the LiDAR data into the camera coordinate system [120].

Georeferencing LiDAR Scans and Camera Images using Navigation Data
Both the LiDAR scan and images collect data in a local coordinate system, georeferencing them refers to determining their global/geodetic coordinates, mostly based on the fused GNSS/IMU/DMI positioning data.This is a process after the calibration among these data collection sensors (as described in Section 5.2).Georeferencing includes the estimation of the orientation (boresight) and position (lever-arm) parameters/offsets with respect to GNSS and IMU [19].The boresight and level-arm parameters define the geometrical relationship between positioning and data collection sensors.There are two approaches to perform georeferencing: 1) the direct approach that uses just GNSS/IMU data, or 2) the indirect approach that uses GNSS/IMU data in addition to GCP and BA for refinement [68].The direct approaches are less demanding since they do not require GCP, and they can achieve accuracy in the decimeter to centimeter levels.Indirect approaches can provide more accurate (centimeter-level) and precise results, whereas typical surveying methods like GCPs and BA are adopted, however, they are very expensive, and their accuracy may vary based on the GCP setup (i.e., position and number of GCPs).

Data Processing in Preparation for Scene Understanding
Mobile mapping is highly relevant to autonomous vehicles, where scene understanding is crucial to not only automate the mapping process but also provide critical scene information in real-time to support platform mobilizations.Scene understanding is the process of identifying the semantics and geometry of objects [121,122].With the enhanced processing capability of mobile computing units, advanced machine learning models, and the ever-increasing datasets, there is a growing trend to perform on-board data processing and scene understanding using the collected measurements from the mobile system [123,124].These include real-time detection, tracking, and semantic segmentation of both dynamic (e.g., pedestrians) and static (e.g., road markings or signs) objects in a scene [122,125].This has driven the need to develop representative benchmark datasets, bettergeneralized training, and domain adaption approaches [126], and lighter machine learning models or network structures that support real-time result inferences [127], examples of these efforts include MobileNet [128], BlitzNet [127], MGNet [129] and MVLidarNet [130].Challenges exist when addressing these needs, as the mobile platforms may collect data under extremely different illuminations (e.g., daylight and night), weather conditions (e.g., rainy, snowy, and sunny), as well as with drastically different sensor suites with different quality of raw data.

Applications
Mobile mapping provides valuable assets for different applications, driven by not only the broad availability of easy-to-use and portable MMS platforms but also their readiness under different operating environments.This is made particularly useful as most of these applications rely on regularly acquired data for detection and monitoring purposes, such as railway-based powerline detection/monitoring [131,132].In this section, we review some of the main applications of mobile mapping technology including road asset management, conditions assessment, creating BIM, disaster response, and heritage conservations.Documented examples of these applications in publications are shown in Table 5 and are detailed in the following subsections.Table 5.An overview of the selected mobile mapping applications.

Selected Applications Highlights
Road asset management and condition assessment Extraction of road assets [79]; road condition assessment [133]; detecting pavement distress using deep-learning [134]; evaluating the pavement surface distress for maintenance planning [135].
• Vehicle-mounted system regularly operating on the road.
• More efficient than manual inspection.

•
Leveraging deep learning to facilitate the inspection process.

BIM
Low-cost MMS for BIM of archeological reconstruction [136]; analysis of BIM for transportation infrastructure [137].
• Data is collected with portable systems.
• Useful for maintenance and renovation planning.
• Rich database for better information management.

Emergency and disaster response
Network-based GIS for disaster response [138]; analyzing post-disaster damage [139].
• Timely and accurate disaster response.
• Facilitates the decision-making process.

•
Effective training and simulations.
• Accurate and automatic measurements.
Digital Heritage Conservation Mapping a complex heritage site using handheld MMS [92]; mapping a museum in a complex building [94]; numerical simulations for structural analysis of historical constructions [144]; digital heritage documentation [145]; mapping archaeological sites [146]; • Utilizes the flexibility of portable platforms.
• Digital recording of cultural sites.
development of a digital heritage inventory system [147].
Road Asset Management and Condition Assessment: MMS operating on roads can regularly collect accurate 3D data of the road and its surroundings, which facilitates road asset management, mapping, and monitoring (e.g., road signs, traffic signals, pavement dimensions) [79].Creating road asset inventories is of great importance given the large volume of road assets.Furthermore, since the condition of the roads deteriorates over time, automatic means of regular transportation maintenance, such as pavement crack and distress detection are critically needed [133,135].Therefore, a key benefit of generating an updated and accurately georeferenced road asset inventory is to allow automatic and efficient change detection, in place of traditionally laborious manual inspections [134].
Typically, road condition monitoring process consists of four steps [148]: 1) data collection using MMS, 2) defect detection, which can be performed automatically using deep learning-based approaches, 3) defect assessment, and 4) road condition index calculation to classify road segments based on the type and severity of the defect.Therefore, MMS data can further help in increasing the safety of the road, for example, by detecting road potholes [78,149], evaluating the location of speed signs before horizontal curves on roadways [150], or assessing the passing sight distance on highways [151].Building Information Modeling: BIM is one of the most well-established technologies in the industry of architecture, engineering, and construction.It provides an integrated digital database about an asset (e.g., building, tunnel, bridge, or 3D model of a city) during the project's life cycle.Typical BIM stages include 1) rigorous data collection, preparation of the 2D plans, and uploading them to specialized software to convert them into a digital format.The collected data includes information such as architectural design (i.e., materials and dimensions), structural design (e.g., beams, columns, etc.), electrical and mechanical designs, sewage system, etc., 2) preparation of the 2D plans and 3) manual upload and update of the plans using specialized software.This can facilitate the design, maintenance, and renovation processes of engineering buildings and infrastructures.However, this can be a challenging task because of the large amount of data that needs to be collected and the lack of automated processes that can increase the time and cost.
Nowadays, MMSs have been widely adopted for BIM projects due to their high accuracy, time efficiency, and lower cost in collecting 3D data.The collected point clouds and images are used to produce the 3D reconstructed model of an asset, then processed under semantic segmentation or classification to extract detailed information of all elements in the asset, the final product is then transferred to the BIM software to extract and simulate important information related to the life cycle of the project.In general, MMS can provide sufficiently accurate results for the derived BIM products [20].These derived products can be either 2D floor plans or 3D mesh or polyhedral models representing the structure of architectures, or the life cycle of the construction process [145].A popular example of MMS in BIM is for 3D city modeling, where it can be used to collect information on roadside buildings [143][144][145] and their structural information like windows layout and doors [98,146].Additionally, they can also be used to maintain plans and record indoor 3D assets and building layouts, which can be generated using handheld, backpack, or trolley MMSs.

Emergency and Disaster Response:
The geospatial data provided by the MMSs are critical to improving emergency, disaster responses, and post-disaster recovery projects.MMSs provide cost and time-efficient solutions to collect and produce the 3D reconstructed models with detailed information about the semantics and geometry to navigate through an emergency or a disaster.MMSs have been reported to provide building-level information (e.g., floors, walls and doors) to a resolution/accuracy at the centimeter-level.In many cases, the building's plans are not up-to-date after construction, which may hinder the rescue mission in case of a fire emergency [139,152].On the other hand, the MMS can provide an efficient alternative to produce an accurate and most updated 3D model of a facility or a building at a minimum cost, which facilitates emergency responses.Another example is collecting 3D data of the roadside assets and feeding them into GIS systems, which can serve as pre-event analysis tools to identify potential impacts of natural disasters through simulation (e.g., flood simulation, earthquake, etc.), in aid of making preventative plans ahead of time.The future directions of an MMS involve more efficient data collection, where it can be as simple as a man holding a cellphone imaging surroundings, and although this can be with low accuracy [3], it can supply critical geo-referenced information in a disastrous or emergency event to supply information for situation awareness and remedies.Vegetation Mapping and Detection: Mobile mapping has shown great success in collecting high resolution and detailed vegetation plots, to create up-to-date digital tree inventories for urban green planning and management, which has greatly accelerated the traditionally laborious visual inspections [142].In addition, vegetation monitoring in important to limit the decline in biodiversity and identify hazardous trees [153,154].Therefore, these requirements impacted the advancement of keeping an up-to-date digital database for vegetation data.The collected 3D data can be used for modeling 3D trees for visualization purposes in urban 3D models.Typically, the workflow may consist of three main steps [155]: 1) tree detection by segmenting the generated point clouds, 2) simplifying the detected structure of point clouds, 3) deriving the geometry parameters such as canopy height and crown width and diameter [155,156].Thus, the collected point cloud is used to detect trees, and low vegetation at the roadside [141,143] to take into account the occlusion on building facades to supply information for city modeling.Moreover, the collected data from MMSs can also be used in calculating urban biomass combatting the urban heat island effects to help in analyzing the influence of the ecosystem on climate change [157].Digital Heritage Conservation: There is a growing trend that people realize the importance of digitally documenting archaeological sites in preserving cultural heritage [146,158,159], as many of these sites are in danger of deterioration and collapses, which may be accelerated due to extreme weather and natural disasters, such as the collapses of many cultural sites in Nepal or Iran due to earthquakes [160].Therefore, there is a critical need to proactively document these sites when they are still in shape [92,161,162].Moreover, a well-documented heritage site may enable other means of tourism such as visual tours, to off-load the site visit and reduce the human factors impacting the deterioration of these sites.As a means of collecting highly accurate 3D data, MMS has been used as one of the primary sources to create 3D models of complex and large heritage and archaeological sites.The data collection part for heritage sites often requires multi-scans for both the interior and exterior from different angles to generate occlusion-free and realistic 3D models.For example, a vehicle-mounted system can be used to drive around the sites to collect exterior information, and wearable/handheld devices can be used to scan the interiors of these sites [162].

Summary
In sum, we have discussed selected applications of mobile mapping that demonstrate the importance and necessity of utilizing MMSs in different scenarios.The adoption of mobile mapping technology in various applications has proven to not only increase productivity but also reduce the cost of operation.For instance, using digital assets for construction has led to a boost in productivity for the global construction sector by 14-15% [163].In addition, digitizing historic structures through creating BIMs using mobile mapping data enables prevention conservation for heritage buildings which saves 40-70% of the maintenance cost [164].Aside from the productivity and cost aspects, mobile mapping data pave the way for producing new road monitoring studies and methods that can increase road safety and dramatically reduce the probability of accidents [165].

Summary
In this paper, we have provided a thorough review of the state-of-the-art mobile mapping systems, their sensors, and relevant applications.We first reviewed sensors and sensor suites typically used in modern MMSs and discussed in detail their types, benefits, and limitations (Section 3).Then, we reviewed the mobile platforms including vehiclemounted, handheld, wearables, etc., and we described in detail their collecting logistics, giving examples of modern systems of different types (Section 4), and specifically highlighted their supported use case scenarios.We further reviewed the critical processing steps that turn the raw data into the final mapping products (Section 5), including sensor calibration and fusion, and georeferencing.Finally, we summarize the most common applications (Section 6) using the capability of modern MMS.

Future Trends
Despite these many variations of MMSs and their sensor suites, the main goal of an MMS is to provide a means of collecting 3D data in close range, at possibly the maximal flexibility and minimal cost.Given the complex terrain environment, a single or a few MMS can hardly be sufficient at all levels of mobile mapping applications.Thus, although off-the-shelf solutions are partially available, developing or adapting MMS to designated applications is still an ongoing effort.So far, MMS is still regarded as an expensive collection means, as the involved equipment, sensors and manpower in handling the logistics and processing are still considerably great.Therefore, as far as we can conclude, the ongoing and future trends continue to be: 1) Reduced sensor cost for high-resolution sensors, primarily LiDAR systems with equivalent accuracy/resolution as currently used ones, but at a much low cost; 2) Crowdsourced and collaborative MMS using smartphone data, for example, the new iPhone version has already been equipped with a low-cost LiDAR sensor.3) Incorporation of new sensors, such as ultra-wide-band tracking systems, as well as WiFi-based localization for use in MMS. 4) Enhanced (more robust) use of cameras as visual sensors for navigation.5) Higher flexibility in sensor integration and customization as well as more mature software ecosystems (e.g., self-calibration algorithms among multiple sensors) to allow users to easily "plug-and-play" different sensors to match the demand for mapping different environments; 6) Advanced post-processing algorithms for pose estimation, data registration for a close-range scenario, dynamic object removal for data cleaning, and refinement for collections in the cluttered environment.7) The integration of novel deep learning solutions at all levels of processing, from navigation and device calibration to 3D scene reconstruction and interpretation.
Given the complexity of MMSs and the application scenarios, a one-stop-shop solution arguably does not exist, however, it is possible to get the process of customization of the system easier if the above-mentioned challenges are consistently attacked.Our future work will encompass component-level surveys that provide the community with comprehensive views accelerating the convergence of solutions addressing the above-mentioned efforts.

Figure 5 .
Figure 5.The standard processing pipeline for MMS data.

Table 2 .
Specifications of different LiDAR sensors.

Table 3 .
Camera sensor overview