Next Article in Journal
Efficient Four-Level LOD Simplification for Single- and Multi-Mesh 3D Scenes Towards Scalable BIM/GIS/Digital Twin Integration
Previous Article in Journal
Correction: Cheng et al. Population Distribution Forecasting Based on the Fusion of Spatiotemporal Basic and External Features: A Case Study of Lujiazui Financial District. ISPRS Int. J. Geo-Inf. 2024, 13, 395
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Using Machine Learning Application Possibilities for the Detection and Classification of Topographic Objects

by
Katarzyna Kryzia
1,
Aleksandra Radziejowska
1,
Justyna Adamczyk
1,* and
Dominik Kryzia
2
1
Department of Geomechanics, Civil Engineering and Geotechnics, Faculty of Civil Engineering and Resource Management, AGH University of Krakow, 30-059 Krakow, Poland
2
Mineral and Energy Economy Research Institute Polish Academy of Sciences, 31-261 Krakow, Poland
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2026, 15(2), 59; https://doi.org/10.3390/ijgi15020059
Submission received: 2 November 2025 / Revised: 17 January 2026 / Accepted: 19 January 2026 / Published: 27 January 2026

Abstract

The growing availability of spatial data from remote sensing, laser scanning (LiDAR), and photogrammetric techniques stimulates the dynamic development of methods for the automatic detection and classification of topographic objects. In recent years, both classical machine learning (ML) algorithms and deep learning (DL) methods have found wide application in the analysis of large and complex data sets. Despite significant achievements, literature on the subject remains scattered, and a comprehensive review that systematically compares algorithm classes with respect to data modality, performance, and application context is still needed. The aim of this article is to provide a critical analysis of the current state of research on the use of ML and DL algorithms in the detection and classification of topographic objects. The theoretical foundations of selected methods, their applications to various data sources, and the accuracy and computational requirements reported in the literature are presented. Attention is paid to comparing classical ML algorithms (including SVM, RF, KNN) with modern deep architectures (CNN, U-Net, ResNet), with respect to different data types such as satellite imagery, aerial orthophotos, and LiDAR point clouds, indicating their effectiveness in the context of cartographic and elevation data. The article also discusses the main challenges related to data availability, model interpretability, and computational costs, and points to promising directions for further research. The summary of the results shows that DL methods are frequently reported to achieve several to over ten percentage points higher segmentation and classification accuracy than classical ML approaches, depending on data type and object complexity, particularly in the analysis of raster data and LiDAR point clouds. The conclusions emphasize the practical significance of these methods for spatial planning, infrastructure monitoring, and environmental management, as well as their potential in the automation of topographic analysis.

1. Introduction

Artificial intelligence (AI) is becoming one of the key tools supporting the development of science, industry, and everyday life. Its wide range of applications includes areas such as medicine, surveying, education, production process automation, transportation, the financial sector, and even smart cities and home technologies. The dynamic development of this field is primarily due to access to ever-larger databases and the growing computing power of computers [1]. One of the most promising areas of AI is machine learning, which allows the creation of models and algorithms capable of processing huge amounts of data and making decisions based on the patterns contained therein [2].
Advances in deep learning, neural network architecture, and optimization techniques have significantly expanded the applicability of AI to complex tasks of data analysis, image processing, and pattern recognition. This is particularly important in areas requiring precise analysis, such as medicine, where AI supports diagnostics and personalized therapies, as well as in geodesy, remote sensing, and cartography, where it enables fast, automated, and precise processing of geospatial data.
Machine learning (ML), including deep learning (DL), is highly effective in remote sensing, land cover classification, satellite image analysis, and landscape change detection. Such AI applications support urban planning, environmental monitoring, land management, and environmental disaster risk reduction [3]. For example, the analysis of satellite images using AI algorithms allows for the rapid identification of changes in land cover, which is a key element in natural resource management and environmental protection [4].
However, the effective use of these methods requires access to large, high-quality training data sets. Therefore, geospatial data management, processing, integration with reference databases, and format standardization are becoming key challenges for modern geodesy and cartography.
Despite the rapid development of ML and DL techniques, the literature still lacks a structured comparative review that clearly identifies the conditions under which specific classes of algorithms are most suitable for the detection and classification of topographic objects using different types of geospatial data. Existing review studies often focus on algorithmic performance in isolation, without explicitly relating method classes to data modality and application context. Therefore, the aim of this article is to provide a structured review of current applications of ML and DL methods in the automatic detection and classification of topographic objects. The analysis focuses on the relationship between algorithmic approaches and data types, including satellite and aerial imagery as well as LiDAR point clouds. Unlike previous surveys, this review adopts a decision-oriented comparative perspective, emphasizing practical applicability rather than isolated performance metrics. The paper is intended as a review study.
This study addresses the following research questions: which classes of ML and DL algorithms are most commonly applied to topographic object detection and classification; how the reported effectiveness of these methods depends on the type of input data; in which application scenarios classical ML approaches remain competitive with DL-based solutions; and what the main limitations and future research directions of DL methods in geospatial analysis are.
The structure of the paper is as follows: Section 2 reviews the existing literature on ML and DL methods applied to geospatial data analysis, with a focus on topographic object classification and detection. Section 3 introduces a comparative taxonomy of algorithm classes and geospatial data sources reported in the literature. Section 4 presents a comparative discussion of reported performance, limitations, and applicability of the analyzed approaches. Section 5 outlines current challenges and future research directions, including model interpretability, computational efficiency, and data integration. Finally, Section 6 summarizes the key findings and conclusions of the review.
The analysis is conducted by comparing the reviewed methods with respect to the type of input data, the level of feature extraction automation, reported accuracy metrics (OA, F1-score, IoU), computational complexity, and their practical applicability in real-world geospatial tasks. These criteria constitute the analytical framework applied consistently throughout the comparative discussion in Section 3 and Section 4.
To qualitatively synthesize and contextualize existing research, this article utilizes a narrative literature review, without conducting bibliometrics or meta-analysis. Literature was identified by searching major databases (Scopus, Web of Science, ScienceDirect, SpringerLink, MDPI, Wiley Online Library) and secondary sources (Google Scholar, institutional repositories, public websites, standard documents) using keywords related to machine learning, deep learning, remote sensing, geospatial data analysis, LiDAR, and object detection. The literature review covers publications published between 2001 and 2025.
Publications were selected based on their thematic relevance, methodological clarity, and contribution to understanding ML/DL applications in geospatial analysis; papers unrelated to the topic or with limited interpretive value were excluded. Results were qualitatively synthesized by grouping them by algorithm type, data type, and application type to identify key trends, common use cases, and recurring challenges. Performance comparisons were made only where methodological assumptions and indicators were sufficiently comparable.

2. Literature Review

Machine learning is an area of artificial intelligence that focuses on developing techniques and algorithms that enable computer systems to perform specific tasks (e.g., making decisions, predicting outcomes, or recognizing spatial objects) based on data obtained during the learning process. An important feature of this approach is that there is no need to program the system explicitly for each situation, as the model learns the relationships in the data on its own. In the context of this review, machine learning is treated primarily as a group of data-driven methods applied to the automated interpretation of geospatial data (Figure 1):
  • Supervised learning is based on data sets containing both input values (features) and corresponding output values (labels). The model then learns to predict labels based on the features provided.
  • Unsupervised learning uses unlabeled data and aims to detect hidden structures or patterns in the data set.
  • Semi-supervised learning combines both approaches, using both labeled and unlabeled data.
  • Reinforcement learning is based on a mechanism of rewards and punishments, thanks to which the model gradually learns optimal decisions in each environment [5].
Figure 1. Classification of machine learning algorithms based on the type (designation) of data provided for training—author-developed conceptual diagram based on [5].
Figure 1. Classification of machine learning algorithms based on the type (designation) of data provided for training—author-developed conceptual diagram based on [5].
Ijgi 15 00059 g001
In this review, a clear distinction is made between the main operations involved in object-based analysis of spatial data, following the terminology commonly adopted in remote sensing and geospatial data science [5,6,7]. Detection is used to denote the localization of an object in an image or point cloud, i.e., identifying where an object is present (e.g., through bounding boxes or candidate regions). Segmentation refers to the delineation of the spatial extent of an object by assigning pixels or points to a given class, as in semantic or instance segmentation approaches [6,7]. Classification denotes the assignment of a semantic label to an identified object, such as a building, road, or vegetation [5]. The term extraction is used as an umbrella concept describing the complete workflow that integrates detection, segmentation, and classification to derive objects from spatial data [5,6]. Although many modern deep learning models perform these operations jointly within a single architecture, this conceptual distinction is maintained in this review to ensure terminological consistency across different data types and methodological approaches.
From the perspective of surveying, remote sensing, and cartography, supervised and semi-supervised learning paradigms are of the greatest practical importance, as they dominate applications related to topographic object detection and classification. Unsupervised and reinforcement learning approaches are therefore not discussed in detail in subsequent sections, as their direct application to large-scale topographic mapping remains limited.
  • Identification (detection) means locating an object in spatial data
  • Classification involves assigning a semantic label describing the object type. Segmentation refers to the delineation of the spatial extent of an object by assigning pixels or points to a given class.
In the Polish spatial reference system, the concept of a topographic object class refers to BDOT10k (Topographic Object Database) and BDOT500, where each class has a unique identifier and semantic description (e.g., BUA—building, DRL—local road, WTR—surface watercourse). This enables direct linkage between automatically detected objects and official topographic databases, ensuring semantic consistency of results. Although these databases are specific to the Polish context, the reviewed methods are generally applicable to international classification systems used in other countries.
BDOT10k is a highly detailed vector database of topographic features that serves as a reference database not only in national spatial information systems but also in the European spatial data interoperability framework. The INSPIRE directive plays a key role in this framework, aiming to harmonize and share spatial data across Member States through common conceptual models, data specifications, and network services [8,9]. BDOT10k’s thematic modules, such as transport networks or hydrography, can be mapped to the corresponding INSPIRE themes and made available in accordance with their technical specifications, enabling cross-national data comparison and analysis [10,11]. From a procedural perspective, the BDOT10k structure is based on a structured feature catalog that complies with the guidelines for defining geographic features specified in the ISO 19110 standard, facilitating semantic links with international data models [12]. This allows BDOT10k to be seen as a national implementation of topographic reference data, embedded in a European and global interoperability framework, rather than as a stand-alone, isolated classification system [13].
In practice, the distinction between detection and classification is often blurred—modern deep learning models (e.g., U-Net, Mask R-CNN, YOLO) perform both processes simultaneously. This conceptual distinction provides a background for the reviewed methods, which range from classical two-stage workflows to integrated end-to-end deep learning approaches (Figure 2).
Among ML algorithms, classical models—such as linear regression (LR), logistic regression, decision trees (DT), random forests (RF), and support vector machines (SVM)—form the basis of many geospatial classification workflows (Figure 2). In topographic object analysis, these algorithms are mainly applied at the classification stage, often following prior segmentation within OBIA (Object Based Image Analysis) frameworks.
In such approaches, detection and classification are treated as separate steps, and model performance depends strongly on manually engineered spectral, geometric, and contextual features [5]. This reliance on predefined features constitutes a key limitation of classical ML methods in complex spatial scenes.
Figure 2. Classical machine learning algorithms used in the classification of topographic objects—author-developed conceptual diagram based on [5].
Figure 2. Classical machine learning algorithms used in the classification of topographic objects—author-developed conceptual diagram based on [5].
Ijgi 15 00059 g002
Studies such as Pluto-Kossakowska and Kamiński (2022) [6] demonstrated that classical supervised classifiers can achieve high accuracy for selected topographic object classes when applied to high-resolution data. Overall accuracy ranged from 76% to 81% for satellite imagery and from 75% to 78% for aerial orthophotomaps, with F1 scores exceeding 0.9 for some classes. These results confirm that classical ML methods remain effective under controlled conditions and well-defined object classes.
Similar conclusions were reported for greenhouse detection using WorldView-2 imagery [14], where non-parametric classifiers (SVM, RF) outperformed parametric methods. Wu et al. (2016) [7] further demonstrated that ensemble-based approaches (RF, SGB) provide stable predictions for biomass estimation, particularly when integrating heterogeneous data sources.
Ensemble and boosting-based ML methods are valued for their robustness and stability, but they remain limited by their reliance on manually defined input features [15].
Classic algorithms are particularly effective when objects are pre-segmented (e.g., from LiDAR point clouds), achieving very high accuracy for specific classes [16]. However, their effectiveness decreases in complex scenes involving small objects, irregular geometries, or overlapping classes, due to limited ability to capture spatial context.
Despite reported accuracies of 86–90% OA for RF in combined LiDAR and imagery classification [17], classical ML models do not perform automatic feature learning, which restricts their scalability and generalization. As a result, their role in recent studies increasingly shifts toward baseline or reference methods for evaluating deep learning approaches.
The methodology and application of artificial neural networks have been widely described, among others, in publications [18,19,20]. Early neural architectures (MLP, RBF, Hopfield, SOM) were applied to specific topographic tasks, typically under constraints of limited data availability and shallow model depth (Figure 3) [21,22,23,24,25,26,27,28].
Figure 3. Neural networks used in the classification of topographic objects—author-developed conceptual diagram based on [6,7].
Figure 3. Neural networks used in the classification of topographic objects—author-developed conceptual diagram based on [6,7].
Ijgi 15 00059 g003
Deep neural networks (DNNs) represent a major shift in topographic object analysis by enabling automatic feature extraction and end-to-end learning from raw satellite, aerial, and LiDAR data. CNN-based architectures such as U-Net, Mask R-CNN, and ResNet (Figure 4) integrate detection, segmentation, and classification within a single framework, capturing multi-scale spatial context [29,30,31,32].
Numerous studies report performance improvements of several to over ten percentage points compared to classical ML approaches, depending on data type and object complexity [17,32,33,34]. These reported gains reflect general trends observed in the literature rather than guaranteed performance improvements for all datasets and applications.
In 3D LiDAR analysis, specialized architectures such as PointNet/PointNet++, voxel-based CNNs, and SparseCNN models enable direct learning from point clouds, achieving high segmentation accuracy for vegetation, buildings, and infrastructure elements [16,35,36].
Figure 4. Deep learning algorithms used in the classification of topographic objects—author-developed conceptual diagram based on [16,36].
Figure 4. Deep learning algorithms used in the classification of topographic objects—author-developed conceptual diagram based on [16,36].
Ijgi 15 00059 g004
Overall, the reviewed literature indicates a gradual transition from classical ML methods toward deep learning approaches, driven by their superior ability to exploit spatial context and integrate detection with classification. Nevertheless, classical ML techniques remain relevant in scenarios characterized by limited training data, strict interpretability requirements, or constrained computational resources. To ensure high-quality results, the data preparation process requires:
  • Appropriate selection of input data size and format.
  • Intelligent and effective collection and annotation of samples, especially in hard-to-reach areas.
  • Careful selection and extraction of features characteristic of the objects under analysis.
  • Attention to these aspects translates into higher accuracy and efficiency of machine learning models in the analysis of topographic objects.
Table 1 provides a synthetic summary of the main advantages and limitations of classical ML, neural network, and deep learning approaches, forming a reference framework for the comparative discussion presented in subsequent sections.
Table 1. Main advantages and limitations of ML methods [16].
Table 1. Main advantages and limitations of ML methods [16].
MethodsAdvantagesDisadvantages
Classical Machine Learning AlgorithmsEasy implementation and deployment
Provides insight into the validity of analyzed features (RF)
Short computation time on smaller data sets
Effective classification task solving even on a small training sample (SVM)
Limited scalability and adaptability
Time-consuming and inaccurate with large data sets
Neural NetworksWeakly or partially supervised learning methods
Developed model compression techniques
Hybrid solutions: combining neural networks with graph algorithms or rules (post-process)
Data augmentation techniques, transfer learning, or the use of pre-trained networks
Requirement for large, representative training datasets with correct labels
Hardware requirements
Dependence on training data
Prone to overfitting on small training samples
Deep Neural NetworksPossibility of retraining existing models
Ability to capture details and dependencies that humans might not notice when designing features manually
Automation—the system learns on its own based on data, which shortens the path to creating a solution
Preparing samples in a topographic context can be costly (manual mapping of thousands of objects by experts)
Sample preparation is prone to errors
Ambiguous interpretability—deep models act like a black box—it is difficult to explain why a given area has been classified in a particular way
With automation, the system learns on its own, extending the training time

3. Comparative Analysis of Methods for Topographic Object Detection and Classification

3.1. Applications of Supervised Learning Algorithms in Satellite and Aerial Images

Unlike previous surveys, this review explicitly relates algorithm classes to data modality and application context, providing a decision-oriented comparison framework. Before the widespread use of deep learning methods in satellite image analysis, classic supervised learning algorithms such as SVM and Random Forest dominated. Such approaches proved effective in distinguishing basic classes (water, vegetation, buildings, etc.), especially when image segmentation into homogeneous objects (so-called GEOBIA) was applied, followed by high-accuracy classification of segments using RF [16]. However, the detection of specific objects (e.g., individual buildings, roads, or vehicles) using only classical geometric and textural features proved to be ineffective—these objects tend to be small, diverse, and highly context-dependent, which makes it difficult to define universal features. This limitation is mainly related to the reliance on manually engineered features and the lack of explicit modeling of spatial context.
The application of machine learning methods in the analysis of remote sensing data, such as Sentinel-2 satellite images was examined by [37]. The aim of the research was to support the interpretation of geographic space and update databases of topographic objects. The research confirmed that these techniques can significantly speed up the process of interpreting and updating geographic data. The work described by [38] presents a new method for detecting road routes on aerial and satellite imagery. This method is based on supervised machine learning techniques (mainly using the decision tree induction algorithm) and has proven highly effective in detecting transportation routes, which can be useful in updating maps and geographic databases. These studies illustrate that classical ML methods are particularly suitable for semi-automated mapping workflows with well-defined object classes and moderate scene complexity, rather than for fully automated object-level detection.
The introduction of deep CNNs has transformed the analysis of satellite and aerial imagery. U-Net and related segmentation networks are commonly used to segment buildings and road networks in aerial images, achieving high precision and completeness of segmentation (often measured by state-of-the-art IoU metrics) [39]. In turn, object detectors (such as the R-CNN or YOLO family) are excellent at tracking smaller objects: for example, automatic detection of vehicles in drone photos or ships in radar images has become possible in near real time with high accuracy. Their advantage lies in end-to-end learning and automatic extraction of hierarchical features across multiple spatial scales. Importantly, deep models can take multispectral data into account—networks trained on multiple channels (RGB, infrared) can detect objects invisible in visible light better than classical methods. In practical applications, agencies and companies use Mask R-CNN networks to generate up-to-date databases of buildings from satellite images; in one humanitarian project, buildings were extracted for Khartoum, Sudan. Using the Mask R-CNN deep learning method, approximately 1.2 million apartments and buildings were identified, and the first results were obtained just 10 days after the procedure was launched [32].

3.2. Applications of Supervised Learning Algorithms in Lidar Data (Point Clouds)

In 3D elevation data, traditional approaches were based on calculating a set of features for each point or segment of the cloud, and then classifying these points using algorithms such as SVM or RF. These features could include, among others, relative height above the terrain, local curvature, or reflection intensity, based on which it was possible to distinguish, for example, ground points from buildings or vegetation. Method for automatic detection and classification of roads based on LIDAR data was described by [40], which used both intensity and range data. To extract roads from LIDAR points, a hierarchical method was used for classification, dividing the points into road and non-road points. The PCD (Phase-Coded-Disk) method was used for road vectorization (to extract the width and axis of the road). Technique for automatic building detection using LIDAR and multispectral imaging was proposed by [41]. Studies have shown that this technique is very effective in detecting buildings in urban areas, but the accuracy of geometric determination was only 13 pixels, which was caused by incomplete delineation of buildings obscured by nearby trees, differences in the length of the roof ridge, and different roof colors.
Such methods have been successfully used for the basic classification of point clouds into several object classes [16], but they required a lot of work in selecting features and often produced results that required further processing (e.g., smoothing). This indicates limited scalability of feature-based ML approaches in complex 3D scenes. In [42], a system for classifying objects and terrain based on LIDAR data using machine learning and signal processing was presented. The authors developed a method that identifies various objects and terrain features based on LIDAR point clouds, which is applicable in many fields, such as urban planning and natural resource management. Digital surface models (DSM LIDAR Digital Surface Models) and digital terrain models (DEM Digital Earth Models) with orthophotomaps to improve the detection of buildings from remote sensing data were proposed by [43]. A UNet model was trained in DSM LIDAR modules, which showed better accuracy in building detection than DSM modules. However, in situations where buildings were not present in the LIDAR data, DSM was necessary for building detection.
Currently, deep learning is increasingly being used for 3D data. There are networks that directly process point clouds (e.g., PointNet and subsequent improvements) that learn the geometry of objects from data without the need for manual feature description. These models explicitly encode spatial relationships between points, enabling more robust generalization. In addition, convolutional 3D networks (based on data voxelization) and graph neural networks (GNN) tailored to unstructured spatial data are used—these techniques can significantly improve the effectiveness of 3D topographic scene classification. For example, deep models with point convolutions can distinguish trees from buildings even when their heights are similar, learning subtle differences in the structure of tree crowns and roofs. The application of CNN to LIDAR data is also sometimes implemented through data fusion—e.g., height projections (DSM/nDSM) are combined with RGB images, and only then is this set fed into the network. It has been shown that a deep network using both image and height information outperforms traditional classification based only on image or height model alone [44]. Specialized deep network architectures have emerged for 3D LIDAR data. PointNet/PointNet++ achieves very good 3D segmentation results without the need for rasterization. They have been shown to be effective in dividing point clouds into classes, e.g., separating tree crowns or extracting buildings. Alternative approaches include 3D convolutions (voxel CNN) or SparseCNN operating on a sparse cloud. In one comparative analysis, SparseCNN even gave slightly higher accuracy than PointNet++ or KPConv. These findings indicate a consistent trend toward higher performance of deep learning models in complex 3D scenes, although reported gains depend on data quality, density, and annotation effort [16,35].

4. Discussion

4.1. Accuracy of Detecting Sample Objects Using Neural Networks

4.1.1. Buildings

Buildings are objects with relatively regular shapes and heights that stand out from the terrain, which facilitates their detection. Traditional OBIA (Object Based Image Analysis) classification approaches (segmentation + classifier) achieve very good results—roof segmentation and classification, e.g., with a random forest, provide high precision while reducing pixel noise. However, the emergence of CNN (e.g., U-Net) and data fusion allows for improved recognition effectiveness. The key difference lies in the transition from hand-crafted object features to automatically learned hierarchical spatial representations. Currently, deep learning networks can detect buildings with performance levels often reported in the range of approximately 90–95%, with high completeness (few objects omitted), depending on data resolution, sensor type, and training data quality. Several studies report high object recognition efficiency by combining orthophoto data with LIDAR—colors distinguish the roof from the surroundings, and height distinguishes the building from the surface. As a result, approaches based on 3D/2D networks (e.g., LIDAR point classification with PointNet and simultaneous image segmentation) identify buildings with very high reliability in most tested scenarios reported in the literature.
Some difficulties arise with buildings that are shaded or covered with vegetation (e.g., trees partially obscuring roofs); models may then omit part of the building footprint. Another problem with correct recognition and classification arises for buildings with unusual roofs, e.g., covered with moss or with a color like the ground. Nevertheless, among topographic objects, buildings are one of the most effectively detected classes automatically, especially when using elevation data [35,45].

4.1.2. Roads

Detecting roads is much more difficult for artificial intelligence algorithms. This is due to their structure—they are long, sometimes narrow, and almost always partially hidden (e.g., under tree crowns). In open areas (non-forested, e.g., cities, fields), orthophotomaps alone, especially high-resolution ones (aerial or drone), allow for good road segmentation. Extracting roads from orthophotomaps using edge filters or convolutional neural networks (CNN) allows for the recognition of asphalt or dirt roads by color and contrast, achieving typically reported accuracy values in the range of 85–90%.
However, in forested areas, LIDAR is necessary to identify the height of objects. The LIDAR system’s pulse scanner partially penetrates the undergrowth, allowing even a forest road to be reconstructed as a series of ground points. Hybrid methods using height (e.g., ground filtering + segment classification) significantly increase effectiveness—as mentioned, from approximately 87% up to 96%, depending on point density and filtering strategy [46]. A two-step approach also yields good results: first, LIDAR classification (separating the ground from other elements), and then detecting continuous strips of ground of sufficient width as roads. Deep networks can support this process, e.g., U-Net, which fuses RGB and elevation images, improved road detection in one experiment compared to using only one data source [47]. Nevertheless, separating roads from other flat surfaces, such as squares, parking lots, or rivers, which may resemble roads in a photo, remains challenging. In such cases, contextual and topological constraints (e.g., network connectivity or curvature continuity) play a critical role.
Deep learning methods learn some of these relationships, but still sometimes require improvement with a graph algorithm (e.g., filling gaps in the detected path using the shortest path algorithm). In general, roads in open terrain are already well extracted automatically, while roads located in forested areas remain difficult objects—here, specialized approaches using dense LIDAR and possibly hybrid methods are most effective [44,47].

4.1.3. Unusual Objects That Are Difficult to Identify

Attempts have also been made in the literature to automatically identify objects such as power lines, poles, cars, and fences. These are difficult classes due to their scale (very small or narrow objects) and often require dedicated solutions. For example, detecting linear wires with LIDAR first requires filtering out most points (ground, trees, buildings) to isolate the few points above ground level—only then can an SVM classifier effectively mark them as overhead lines [10,29]. Deep 3D networks can also detect such elements, but they require a very dense cloud and special architecture (e.g., CapsNet or point cluster-based detection) [48]. Vehicles on orthophotos, on the other hand, are small and can be recognized by CNN (as a separate class) at high resolutions, but in land cover classification they are sometimes included in the class of roads or buildings, depending on the surroundings. These objects are generally challenging—their detection often does not achieve as high accuracy as the main classes (often below 80% accuracy), and commercial solutions more often use object detection methods (e.g., detecting vehicles as separate objects in images, outside of land cover classification). In the context of topography, however, these small objects are less often considered, with the focus being on land cover classes and large permanent objects.

4.2. Speed of Operation and Computational Efficiency of Selected Algorithms

Classic algorithms are characterized by relatively low computational requirements at the inference stage. The decision of the RF classifier (passing through the tree) or SVM (calculating the product of features with carrier vectors) is fast on a single sample, which facilitates their use in environments with limited hardware resources. Training such models is also less time-consuming than deep networks—it can often be performed on a GPU (Graphics Processing Unit) in a reasonable time for moderate data sets. For example, LIDAR point cloud classification using the RF method with prepared geometric features can be performed quickly on a typical workstation. However, if the traditional approach requires analyzing every pixel of the image (e.g., scanning a moving window with an SVM classifier), the processing time increases dramatically—the lack of parallel processing characteristic of convolution makes this solution slower than modern CNNs. This limitation becomes critical for large-area or high-resolution datasets. The advantage of traditional models, on the other hand, is their memory efficiency and the possibility of code optimization (e.g., parallel execution of decision trees). Overall, in small-scale tasks or when operating on low-power devices, well-tuned classical algorithms can run faster (or fast enough) than complex neural networks [49].
The computational requirements of deep neural networks are higher, especially during the training phase. Training CNNs (e.g., U-Net) on a large set of satellite images requires the use of GPUs and can take from several hours to many days, depending on the size of the data and the architecture [50]. Inference (detecting objects in new data) is already much faster thanks to the parallelism of GPU computations—modern one-stage detectors (e.g., YOLO) can operate in real time even on high-resolution images [51,52]. They achieve this at the cost of slightly lower accuracy compared to two-stage methods, which is an acceptable compromise in many applications.
On the other hand, two-stage models such as Faster R-CNN/Mask R-CNN perform more complex analysis (region proposal, classification, segmentation)—they are therefore slower and usually not suitable for applications requiring immediate response (they are more often used for offline processing). U-Net segmentation networks can be optimized for speed—there are variants that use fewer parameters or networks with multi-stage downsampling, which increases inference speed without significant loss of accuracy.
Despite higher requirements, deep models often outperform traditional approaches in terms of overall image processing, as they process thousands of pixels in parallel (where SVM would analyze them sequentially). After a single computationally expensive training phase, CNN-based models can significantly accelerate repeated analyses of multiple scenes, which explains their growing adoption in operational topographic mapping workflows.
To operationalize the proposed decision-oriented comparative framework, the results of the literature review are consolidated into two synthesis tables presented at the end of this section. Table 2 systematically compares the reviewed approaches in terms of data modality, target object, algorithm family, performance characteristics, and practical constraints. Building upon this comparison, Table 3 provides explicit decision-oriented recommendations by mapping typical application conditions to suitable method families. This structured synthesis complements the narrative discussion and enables informed methodological choices.
For clarity and to ensure verifiability, the indicative performance ranges for various topographic object classes are compiled in Table 4, along with their corresponding metrics, data modalities, method families, and representative studies. These values are intended as guidelines reflecting general trends reported in the literature, rather than as directly comparable benchmark figures.
Table 4. Reported quantitative performance ranges for topographic object extraction.
Table 4. Reported quantitative performance ranges for topographic object extraction.
Object ClassData ModalityMethod FamilyMetricReported RangeRepresentative Studies
BuildingsVHR RGBCNN (U-Net, FCN)IoU0.75–0.90[53,54]
BuildingsLiDARPoint-based DLF1-score0.88–0.95[55,56,57]
RoadsRGBCNNF1-score0.70–0.88[29,58,59,60,61,62]
RoadsLiDARRF/SVMPrecision0.75–0.85[63,64]
Forest roadsMultimodalDL fusionIoU0.65–0.82[42,44]

5. Future Research Directions and Emerging DL Architectures

5.1. Specialized Architectures for Decision-Making Challenges in Operational Mapping

Recent research in topographic object detection and classification increasingly focuses on advanced deep learning architectures designed to improve segmentation accuracy, robustness, and generalization across heterogeneous datasets [17,32,33,35]. Among emerging approaches, models such as MSEONet, CT-HiffNet, and SRMF have been proposed to address limitations of conventional convolutional networks, particularly in complex spatial scenes characterized by object occlusion, scale variability, and class imbalance. These architectures build upon concepts of multi-scale feature extraction, hierarchical representation learning, and enhanced boundary modeling that have already been demonstrated as effective in recent CNN-based approaches [32,33,65]. Their development reflects a shift toward architectures explicitly tailored to the geometric and semantic complexity of topographic data [66]. Operational topographic mapping involves a sequence of decisions—from detecting landscape changes to delineating feature boundaries—where specialized deep networks can provide targeted support. One key direction is designing architectures that improve segmentation precision for complex objects and edges, directly enhancing map feature extraction. For instance, ref. [67] proposed MSEONet, a multi-scale semantic segmentation network with edge-optimization modules (CEE and EPFO) that refine object boundaries during upsampling, addressing blurred or missed edges in features such as buildings and roads. Tests on benchmark datasets (ISPRS Potsdam/Vaihingen) showed improved boundary accuracy over prior methods, with notable edge gains at low additional computational cost—reducing manual correction by cartographers in operational workflows.
Another specialized architecture for multi-scale and small-object detection is MSNet. The architecture was designed to improve the detection of small or occluded objects (e.g., vehicles, small buildings) in high-resolution remote sensing images [68]. MSNet uses a Partial and Pointwise Convolution Extraction Module to capture spatial–channel information with fewer parameters, and a Local–Global Information Fusion Module with a fusion pyramid to combine fine textures with broader context across scales. On DIOR, HRRSD and NWPU VHR-10 it achieved higher mAP than prior methods (75.3% on DIOR; >95% on NWPU VHR-10), helping mapping tasks that depend on reliably finding fine details, while remaining lightweight enough for large-scale deployment [68]. MSNet and other multi-scale models are widely cited examples of architectures that can be referenced when describing multi-scale and feature fusion mechanisms [69].
In agricultural mapping, specialized networks target irregular, domain-specific boundaries. CT-HiffNet proposed for extracting cropland field parcels from satellite imagery [65], where heterogeneous crop textures and weak contours make delineation difficult. The model fuses contour and texture cues via a hybrid attention module and a multi-scale hierarchical decoder, while a residual shrinkage block suppresses irrelevant features, improving robustness to spatio-spectral variability across sensors and acquisition times. Tested on GaoFen-2, Sentinel-2 and Google Earth imagery across diverse regions, CT-HiffNet reported consistently high performance (often > 80% precision/recall) and strong transferability to unseen areas (precision > 84%, recall > 86.5%), implying reduced retraining needs. For mapping agencies, this can automate parcel delineation and land-use updates with reliable boundaries [65].
A further challenge in topographic mapping is class imbalance and long-tail feature distributions: rare classes (e.g., uncommon land cover types or small facilities) have too few samples for standard models to learn reliably. This was addressed with SRMF (Semantic Reordering and Multi-modal Fusion) [70] for semantic segmentation in ultra-high-resolution imagery. SRMF oversamples underrepresented classes via multi-scale patch cropping/resampling and applies semantic reordering augmentation to rebalance class occurrence within training patches. It also fuses visual features with generic text embeddings (e.g., class names/descriptions encoded by a language model) without needing per-region text labels, acting as knowledge transfer for rare classes. SRMF reported improved mIoU on long-tail benchmarks (e.g., +3.33% on URUR), helping operational systems detect uncommon but important map features more consistently and reducing the need for manual correction [70].
Change detection—identifying what has changed between map updates—is a core mapping task, but traditional methods often rely on separate pipelines for binary vs. semantic change, complicating operations. UniRSCD was proposed by [71], a unified remote-sensing change detection paradigm that processes horizontally concatenated bi-temporal images within a single encoder–decoder. It combines a state space model backbone (to capture long-range spatial–temporal dependencies efficiently) with a frequency change prompt encoder that highlights both global and local changes, avoiding task-specific branches. UniRSCD reported state-of-the-art performance across building change, semantic change, and damage assessment; on the SECOND dataset it improved the best F1 score by 5.8%. Operationally, one unified model can flag diverse change types while reducing training overhead and false alarms, improving trust in automated alerts used by human analysts [71].

5.2. Fundamental Models and Self-Supervised Learning

A growing research trend also involves the adoption of foundation models and self-supervised learning paradigms in geospatial analysis. Foundation models and spatio-temporal transformers support the analysis of image sequences and the classification of spatial changes, improving model generalization across different regions and sensor configurations. Studies indicate that transformers are often combined with CNNs to enhance accuracy, although purely transformer-based models have also shown strong potential. Their key strength lies in their ability to perform global feature aggregation and to analyze long temporal sequences [72]. Recent studies on deep convolutional and point-based networks highlight the limitations imposed by the availability of labeled training data and emphasize the need for transferable representations learned from large-scale datasets [17,35,36]. Self-supervised learning strategies, which exploit spatial continuity and geometric consistency in remote sensing and LiDAR data, are therefore considered a promising direction for reducing annotation costs and improving model generalization across different geographic regions and sensor configurations [35,36,73].
One promising strategy is to combine foundation models with self-supervised learning tailored to remote sensing. Self-supervised learning (SSL) refers to pre-training models on unlabeled data through proxy tasks (like predicting masked pixels or clustering similar images) to learn useful representations. Remote sensing is an ideal field for SSL because of the abundance of imagery and the high cost of labeling. Recent advances show that SSL can yield encoders that capture domain-specific information (e.g., the typical textures of roofs or fields) that might be absent in a model like SAM. For example, the DINO framework and its successors have been applied to satellite imagery to learn features that align with semantic classes without human labels. These SSL models excel at certain tasks like boundary detection or grouping similar land cover types, but on their own they may lack the global context and versatility of a foundation model pre-trained on billions of examples [74].
FFSNet (Fusion of Foundation and SSL Network) was proposed by [75], it combines a lightweight foundation model and a self-supervised encoder for remote-sensing segmentation. It fuses MobileSAM (general visual priors) with a DINOv2 encoder pre-trained on remote-sensing imagery (domain-sensitive features) using an adaptive attention-based module that dynamically weights each source per feature. This lets the network lean on DINO features for spectrally similar classes and on SAM priors for rare shapes/textures. FFSNet reported state-of-the-art results on benchmarks such as LoveDA and ISPRS Potsdam, outperforming models like AerialFormer with about half the trainable parameters. It achieved 55.4% mIoU on LoveDA and very high F1 on Potsdam (88.3%) and Vaihingen (91.6%) with ~44.7 M parameters—far smaller than SAM. Practically, the approach suggests better robustness to domain shifts and reduced need for extensive retraining, while remaining efficient enough for large-scale mapping deployments [75].

5.3. Multimodal Data Fusion

Another important direction of future research is multimodal data fusion, combining information from optical imagery, SAR, LiDAR point clouds, and derived height products such as DSM and nDSM. Multiple reviewed studies demonstrate that fusing spectral and elevation information leads to higher detection and classification accuracy than single-modality approaches, particularly in complex urban and forested environments [32,43,44,55]. Continued development of multimodal deep learning architecture is expected to further enhance the reliability and scalability of automated topographic analysis systems, bridging the gap between research-oriented models and operational mapping workflows [16,35,67].
The exploration of specialized architectures, foundation models (with self-supervision), and multimodal fusion reveals a landscape of complementary approaches—each with strengths that address certain mapping problems, and each with weaknesses or open issues.
Over the next few years, the most plausible vision of a mapping system is as follows: a foundation-model backbone, strengthened with specialized modules, capable of operating on multiple modalities and equipped with uncertainty quantification and explainability mechanisms, all embedded in an interactive tool for the analyst. Such a system can realistically shorten map-update cycles, improve product consistency, and enhance the quality of spatial decision-making, provided that research addresses deployment aspects (data, standards, scaling) as intensively as architectural innovations themselves.

6. Conclusions

A review of studies shows that the highest reported effectiveness in identifying topographic objects is achieved by combining orthophoto LIDAR data with deep learning for joint detection, segmentation, and classification, particularly in scenarios involving complex spatial structures, as reported in the literature, provided that sufficient training data is available.
Approaches based on convolutional neural networks (e.g., U-Net segmentation) generally achieve higher classification accuracy than traditional methods (SVM, Random Forest) in classification accuracy, especially when it is necessary to distinguish between objects with similar spectral characteristics through the incorporation of spatial and contextual information. Networks that process 3D point clouds (PointNet, SparseCNN, etc.) also excel at 3D scene segmentation, allowing points to be directly classified as buildings, trees, or ground in an end-to-end manner, without manual feature engineering, with better accuracy than classic algorithms based on manual features. Nevertheless, traditional ML methods remain relevant for simpler classification tasks, limited datasets, or applications requiring higher model interpretability.
In summary, the most effective approaches are those that combine the advantages of different data sources and modern algorithms, as demonstrated across multiple reviewed studies: for example, a CNN/U-Net model with a height channel (DSM with LIDAR) for orthophoto segmentation, or a point-based network (PointNet++) supported by intensity and color features for point cloud classification. Such multimodal and multi-source fusion strategies tend to achieve the highest reported accuracy levels for complex scenes. Traditional algorithms (SVM, RF) may therefore be regarded as complementary or baseline solutions, effective for smaller projects or as a preliminary step (e.g., for data filtering) but generally giving way to deep learning methods in demanding topographic object identification tasks.

Author Contributions

Conceptualization, Katarzyna Kryzia, Aleksandra Radziejowska, Justyna Adamczyk and Dominik Kryzia; methodology, Katarzyna Kryzia, Aleksandra Radziejowska, Justyna Adamczyk and Dominik Kryzia; software, Aleksandra Radziejowska; validation, Katarzyna Kryzia, Dominik Kryzia and Justyna Adamczyk; formal analysis, Katarzyna Kryzia; investigation, Aleksandra Radziejowska; resources, Katarzyna Kryzia and Dominik Kryzia; data curation, Aleksandra Radziejowska and Justyna Adamczyk; writing—original draft preparation, Justyna Adamczyk; writing—review and editing, Aleksandra Radziejowska, Katarzyna Kryzia and Dominik Kryzia; visualization, Justyna Adamczyk, Aleksandra Radziejowska; supervision, Dominik Kryzia; project administration, Justyna Adamczyk; funding acquisition, Katarzyna Kryzia and Dominik Kryzia. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The National Centre for Research and Development, within the research project “Automation of topographic object detection process using artificial intelligence algorithms” (INFOSTRATEG-V/0006/2023/A).

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grants no. PLG/2024/017520 and PLG/2025/018618.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
MLMachine Learning
DLDeep Learning
GISGeographic Information System
BDOT10kTopographic Object Database
LRLinear Regression Algorithm
DTDecision Trees Algorithm
k-NNk-nearest Neighbours Algorithm
SVMSupport Vector Machine Algorithm
RFRandom Forest Algorithm
MLEMaximum Likelihood Algorithm
NNNeural Networks
MLPMultilayer Perception Networks
RBFRadial Basis Function Networks
SOMKohonen Networks (Self-Organizing Maps
DNNDeep Neural Networks
YOLOYou Only Look Once Networks
ResNetResidual Networks

References

  1. Iorkaa, A.A.; Barma, M.; Muazu, H.G. Machine Learning Techniques, Methods and Algorithms, Conceptual and Practical Insights. Int. J. Eng. Res. Appl. 2021, 11, 55–64. [Google Scholar]
  2. Bjola, C. AI for Development: Implications for Theory and Practice. Oxf. Dev. Stud. 2022, 50, 78–90. [Google Scholar] [CrossRef]
  3. Unuriode, A.O.; Durojaiye, O.M.; Yusuf, B.Y.; Okunadea, L.O. The Integration of Artificial Intelligence into Database Systems (AI-DB Integration Review). Int. J. Cybern. Inform. (IJCI) 2023, 12, 161–172. [Google Scholar] [CrossRef]
  4. Rashid, A.B.; Kausik, M.D. AI Revolutionizing Industries Worldwide: A Comprehensive Overview of Its Diverse Applications. Hybrid Adv. 2024, 7, 100277. [Google Scholar] [CrossRef]
  5. Podział Modeli Uczenia Maszynowego Wraz z Przykładami Zastosowania—Centralny Ośrodek Informatyki—POPC Wsparcie—Portal Gov.pl. Available online: https://www.gov.pl/web/popcwsparcie/podzial-modeli-uczenia-maszynowego-wraz-z-przykladami-zastosowania (accessed on 25 October 2025).
  6. Pluto-Kossakowska, J.; Kamiński, M. Analiza możliwości automatycznej detekcji obiektów topograficznych na zdjęciach lotniczych i satelitarnych VHR. Teledetekcja Sr. 2022, 62, 5–15. Available online: https://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-68ac9a70-7466-4b5a-858a-ed64a3400396/c/pluto-kossakowska_analiza_62_2022.pdf (accessed on 25 October 2025).
  7. Wu, C.; Shen, H.; Shen, A.; Deng, J.; Gan, M.; Zhu, J.; Xu, H.; Wang, K. Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery. J. Appl. Remote Sens. 2016, 10, 035010. [Google Scholar] [CrossRef]
  8. European Parliament; Council of the European Union. Directive 2007/2/EC of the European Parliament and of the Council establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). Off. J. Eur. Union 2007, L108, 1–14. [Google Scholar]
  9. Annoni, A.; Craglia, M.; Ehlers, M.; Georgiadou, Y. Towards a European Spatial Data Infrastructure. Int. J. Geogr. Inf. Sci. 2005, 19, 1–11. [Google Scholar]
  10. INSPIRE Thematic Working Group. INSPIRE Data Specification on Transport Networks—Technical Guidelines; European Commission Joint Research Centre: Ispra, Italy, 2010. [Google Scholar]
  11. INSPIRE Maintenance and Implementation Group (MIG). INSPIRE Data Specification on Land Cover—Technical Guidelines (Version 3.0); European Commission: Brussels, Belgium, 2024. [Google Scholar]
  12. ISO 19110:2005; Geographic Information—Methodology for Feature Cataloguing. International Organization for Standardization: Geneva, Switzerland, 2005.
  13. Główny Urząd Geodezji i Kartografii (GUGiK). Baza Danych Obiektów Topograficznych BDOT10k—Specyfikacja Techniczna; GUGiK: Warszawa, Poland, 2021. [Google Scholar]
  14. Koc-San, D. Evaluation of different classification techniques for the detection of glass and plastic greenhouses from WorldView-2 satellite imagery. J. Appl. Remote Sens. 2013, 7, 073553. [Google Scholar] [CrossRef]
  15. Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorootics 2013, 7, 21. [Google Scholar] [CrossRef]
  16. Gharineiat, Z.; Kurdi, F.T.; Campbell, G. Review of Automatic Processing of Topography and Surface Feature Identification LIDAR Data Using Machine Learning Techniques. Remote Sens. 2022, 14, 4685. [Google Scholar] [CrossRef]
  17. Li, H.; Hu, B.; Li, Q.; Jing, L. CNN-Based Individual Tree Species Classification Using High-Resolution Satellite Imagery and Airborne LIDAR Data. Forests 2021, 12, 1697. [Google Scholar] [CrossRef]
  18. Kwaśnicka, H.; Markowska-Kaczmar, U. Sieci Neuronowe w Zastosowaniach; Oficyna Wydawnicza Politechniki Wrocławskiej: Wrocław, Poland, 2005. [Google Scholar]
  19. Tadeusiewicz, R. Odkrywanie Właściwości Sieci Neuronowych Przy Użyciu Programów w Języku C#; Polska Akademia Umiejętnosci: Kraków, Poland, 2007. [Google Scholar]
  20. Suzuki, K. Artificial Neural Networks—Architectures and Applications; InTech: London, UK, 2013; Available online: https://www.intechopen.com/books/3110#:~:text=DOI-,10.5772/3409,-ISBN (accessed on 25 October 2025).
  21. Pokonieczny, K. Wykorzystanie perecptronu wielowarstwowoego do wyszczególniania obiektów o znaczeniu orientacyjnym na mapach topograficznych. Rocz. Geomatyki 2016, XIV, 397–405. [Google Scholar]
  22. Janaszek-Mańkowska, M.A.; Mańkowski, D.; Kozdrój, J. Sieci nweuronowe typu MLP w prognozowaniu plonu jęczmienia jarego. Biul. Inst. Hod. I Aklim. Roślin 2011, 259, 93–112. [Google Scholar]
  23. Zborowicz, M.; Boniecki, P. MLP neural network as a tool for images computer analysis. J. Res. Appl. Agric. Eng. 2010, 55, 124–127. [Google Scholar]
  24. Boniecki, P. The neural networks of the type MLP and RBF as classifying tools in picture analysis. J. Res. Appl. Agric. Eng. 2006, 51, 34–39. [Google Scholar]
  25. Gil, J.; Mrówczyńska, M.; Gibowski, S. Model ciągły sieci neuronowej typu Hopfielda w zastosowaniu do oszacowania stabilności punktów sieci geodezyjnej pionowej pomiarowo-kontrolnej. Acta Sci. Pol. Geod. Descr. Terrarum 2007, 6, 39–50. [Google Scholar]
  26. Mandl, T.; Eibl, M. Topographic Maps Based on Kohonen Self Orgaznizing Map san Empirical Approach. In Proceedings of the European Symposium on Intelligent Technologies, Hybrid Systems and Their Implementation on Smart Adaptive Systems (EUNITE), Puerto de Cruz, Teneriffa, Spain, 19–14 December 2001; pp. 467–471. [Google Scholar]
  27. Astudillo, C.A.; Oommen, B.J. Topology-oriented self-organizing maps: A survey. Pattern Anal. Appl. 2014, 17, 223–248. [Google Scholar] [CrossRef]
  28. Cottrell, M.; Olteanu, M.; Rossi, F.; Vialaneix, N. Theoretical and applied aspects of the self-organizing maps. In Proceedings of the 11th International Workshop WSOM 2016, Houston, TX, USA, 6–8 January 2016; pp. 3–26. [Google Scholar]
  29. Wu, T.; Luo, J.; Gao, L.; Sun, Y.; Dong, W.; Zhou, Y.; Liu, W.; Hu, X.; Xi, J.; Wang, C.; et al. Geo-Object-Based Vegetation Mapping via Machine Learning Methods with an Intelligent Sample Collection Scheme: A Case Study of Taibai Mountain, China. Remote Sens. 2021, 13, 249. [Google Scholar] [CrossRef]
  30. Di, K.; Li, W.; Yue, Z.; Sun, Y.; Liu, Y. A machine learning approach to crater detection from topographic data. Adv. Space Res. 2014, 54, 2419–2429. [Google Scholar] [CrossRef]
  31. Jalalipour, S.; Ayyalasomayjula, S.; Damrah, H.; Lin, J.; Rekabdar, B.; Li, R. Deep Learning-Based Spatial Detection of Drainage Structures using Advanced Object Detection Methods. In Proceedings of the 2023 Fifth International Conference on Transdisciplinary AI (TransAI), Laguna Hills, CA, USA, 25–27 September 2023. [Google Scholar]
  32. Tiede, D.; Schwendemann, G.; Alobaidi, A.; Wendt, L.; Lang, S. Mask R-CNN-based building extraction from VHR satellite data in operational humanitarian action: An example related to Covid-19 response in Khartoum, Sudan. Trans. GIS 2021, 25, 1213–1227. [Google Scholar] [CrossRef]
  33. Boston, T.; Van Dijk, A.; Larraondo, P.R.; Thackway, R. Comparing CNNs and Random Forests for Landsat Image Segmentation Trained on a Large Proxy Land Cover Dataset. Remote Sens. 2022, 14, 3396. [Google Scholar] [CrossRef]
  34. Liu, T.; Amr, A.E.; Morton, J.; Wilhelm, V.L. Comparing Fully Convolutional Networks, Random Forest, Support Vector Machine, and Patch-based Deep Convolutional Neural Networks for Object-based Wetland Mapping using Images from small Unmanned Aircraft System. GIScience Remote Sens. 2017, 55, 243–264. [Google Scholar] [CrossRef]
  35. Schlosser, A.; Gergely, S.Z.; László, B.; Zsolt, V. Building Extraction Using Orthophotos and Dense Point Cloud Derived from Visual Band Aerial Imagery Based on Machine Learning and Segmentation. Remote Sens. 2020, 12, 2397. [Google Scholar] [CrossRef]
  36. Abbas, S.; Almadhor, A.; Sampedro, G.A.; Alsubai, S.; Al, H.A.; Strážovská, Ľ.; Zaidi, M.M. Efficient geospatial mapping of buildings, woodlands, water and roads from aerial imagery using deep learning. PeerJ Comput. Sci. 2024, 10, 2039. [Google Scholar] [CrossRef] [PubMed]
  37. Adamiak, M. Wykorzystanie Technik Uczenia Maszynowego i Teledetekcji do Wspomagania Interpretacji Przestrzeni Geograficznej. Ph.D. Thesis, Uniwersytet Łódzki, Łódź, Poland, 2022. Available online: https://www.researchgate.net/publication/364164811_Wykorzystanie_technik_uczenia_maszynowego_i_teledetekcji_do_wspomagania_interpretacji_przestrzeni_geograficznej (accessed on 25 October 2025).
  38. Krawiec, K.; Wyczałek, I. Nadzorowana detekcja tras komunikacyjnych z wykorzystaniem metod uczenia maszynowego. Arch. Fotogram. Kartogr. Teledetekcji 2006, 16, 361–371. [Google Scholar]
  39. Coulter, L.; Hall, T.; Guzman, L.; Kasahara, I. Satellite Image Building Detection using U-Net Convolutional Neural Network. Image Process. Appl. 2021, EE556, 1–9. Available online: https://www.luisjguzman.com/media/EE5561/building_detection.pdf (accessed on 25 October 2025).
  40. Clode, S.; Rottensteiner, F.; Kootsookos, F.; Zelniker, E. Detection and Vectorization of Roads from LIDAR Data. Photogramm. Eng. Remote Sens. 2007, 73, 517–535. [Google Scholar] [CrossRef]
  41. Awrangjeb, M.; Ravanbakhsh, M.; Fraser, C.S. Automatic detection of residential buildings using LIDAR data and multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2010, 65, 457–467. [Google Scholar] [CrossRef]
  42. García, A.; Martínez, B.; Moroyoqui, Z.; Picos, K.; Orozco-Rosas, U. LIDARbased classification of objects and terrain. In Optics and Photonics for Information Processing XVIII, Proceedings of the Optical Engineering + Applications, San Diego, CA, USA, 18–23 August September 2024; Conference Poster; SPIE: Bellingham, WA, USA, 2024. [Google Scholar] [CrossRef]
  43. Lingli, Z.; Hattula, E.; Raninen, J. Enhancing Building Extraction with LIDAR and Aerial Image Data: Examining Data Combinations for Improved Accuracy in Rural and Urban Environments; GIM International: Lemmer, The Netherlands, 28 November 2024. [Google Scholar]
  44. Buján, S.; Guerra-Hernández, J.; González-Ferreiro, E.; Miranda, D. Forest Road Detection Using LIDAR Data and Hybrid Classification. Remote Sens. 2021, 13, 393. [Google Scholar] [CrossRef]
  45. Nahhas, F.H.; Shafri, H.Z.M.; Sameen, M.I.; Pradhan, B.; Mansor, S. Deep learning approach for building detection using LIDAR–orthophoto fusion. J. Sens. 2018, 2018, 1–12. [Google Scholar] [CrossRef]
  46. Morsy, S.; Shaker, A.; El-Rabbany, A. Multispectral LIDAR Data for Land Cover Classification of Urban Areas. Sensors 2017, 17, 958. [Google Scholar] [CrossRef] [PubMed]
  47. Candan, A.T.; Kalkan, H. U-Net-based RGB and LIDAR image fusion for road segmentation. Signal Image Video Process. 2023, 17, 2837–2843. [Google Scholar] [CrossRef]
  48. Mahphood, A.; Arefi, H. Density-Based Method for Building Detection from LIDAR Point Cloud. In Proceedings of the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences X-4/W1-2022, GeoSpatial Conference 2022—Joint 6th SMPR and 4th GIResearch Conferences, Tehran, Iran, 19–22 February 2023. [Google Scholar]
  49. Desnos, K.; Pertuz, S. Design and Architecture for Signal and Image Processing. In Proceedings of the 15th International Workshop, DASIP 2022, Budapest, Hungary, 20–22 June 2022; Available online: https://link.springer.com/book/10.1007/978-3-031-12748-9 (accessed on 25 October 2025).
  50. Bhatti, M.A.; Syam, M.; Chen, H.; Hu, Y.; Keung, L.W.; Zeeshan, Z.; Ali, Y.A.; Sarhan, N. Utilizing convolutional neural networks (CNN) and U-Net architecture for precise crop and weed segmentation in agricultural imagery: A deep learning approach. Big Data Res. 2024, 36, 100465. [Google Scholar] [CrossRef]
  51. Huang, W.; Sun, Q.; Yu, A.; Guo, W.; Xu, Q.; Wen, B.; Xu, L. Leveraging Deep Convolutional Neural Network for Point Symbol Recognition in Scanned Topographic Maps. ISPRS Int. J. Geo-Inf. 2023, 12, 128. [Google Scholar] [CrossRef]
  52. Hashmi, H.; Dwivedi, R.; Kumar, A. YOLO-RS: An Efficient YOLO-Based Approach for Remote Sensing Object Detection. In Proceedings of the 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 22–23 December 2023; pp. 50–56. [Google Scholar] [CrossRef]
  53. Zhang, X.; Li, Y.; Wang, X.; Liu, F.; Wu, Z.; Cheng, X.; Jiao, L. Multisource interactive stair attention for remote sensing image captioning. Remote Sens. 2023, 15, 579. [Google Scholar] [CrossRef]
  54. He, Z.; He, H.; Li, J.; Chapman, M.A.; Ding, H. A short-cut connections-based neural network for building extraction from high resolution orthoimagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, XLIII-B1-2022, 39–44. [Google Scholar] [CrossRef]
  55. Smith, M.J.; Fleming, L.; Geach, J.E. Earthpt: A time series foundation model for earth observation. arXiv 2023, arXiv:2309.07207. [Google Scholar]
  56. Diab, A.; Kashef, R.; Shaker, A. Deep Learning for LiDAR Point Cloud Classification in Remote Sensing. Sensors 2022, 22, 7868. [Google Scholar] [CrossRef]
  57. Zhang, Y.; Wang, T.; Lin, X.; Zhao, Z.; Wang, X. Building Extraction from LiDAR Point Clouds Based on Revised RandLA-Net. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, XLVIII-1-2024, 943–948. [Google Scholar] [CrossRef]
  58. Zhou, G.; Qian, L.; Gamba, P. Advances on Multimodal Remote Sensing Foundation Models. Remote Sens. 2025, 17, 3532. [Google Scholar] [CrossRef]
  59. Zhu, X.; Zhu, J.; Li, H.; Wu, X.; Li, H.; Wang, X.; Dai, J. Uniperceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 16804–16815. [Google Scholar]
  60. Guo, Z.; Hu, X.; Wang, J.; Miao, X.Y.; Sun, M.T.; Wang, H.W.; Ma, X.Y. A duplex transform heterogeneous feature fusion network for road segmentation. Sci. Rep. 2024, 14, 17438. [Google Scholar] [CrossRef]
  61. Lu, X.; Weng, Q. Deep learning-based road extraction from remote sensing imagery: Progress, problems, and perspectives. ISPRS J. Photogramm. Remote Sens. 2025, 228, 122–140. [Google Scholar] [CrossRef]
  62. Luo, Z.; Gao, L.; Xiang, H.; Li, J. Road object detection for HD map: Full-element survey, analysis and perspectives. ISPRS J. Photogramm. Remote Sens. 2023, 197, 122–144. [Google Scholar] [CrossRef]
  63. Wu, L.; Zhu, X.; Lawes, R.; Dunkerley, D.; Zhang, H. Comparison of machine learning algorithms for classification of LiDAR points for characterization of canola canopy structure. Int. J. Remote Sens. 2019, 40, 5973–5991. [Google Scholar] [CrossRef]
  64. Mohamed, M.; Morsy, S.; El-Shazly, A. Machine learning for mobile lidar data classification of 3d road environment. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, XLIV-M-3-2021, 113–117. [Google Scholar] [CrossRef]
  65. Wu, H.; Xie, J.; Deng, W.; Lin, A.; Shariff, A.R.M.; Akmalov, S.; Wu, W.; Li, Z.; Yu, Q.; Wang, Q.; et al. CT-HiffNet: A contour-texture hierarchical feature fusion network for cropland field parcel extraction from high-resolution remote sensing images. Comput. Electron. Agric. 2025, 239, 111010. [Google Scholar] [CrossRef]
  66. Chen, N.; Yang, R.; Zhao, Y.; Dai, Q.; Wang, L. Remote Sensing Image Segmentation Network That Integrates Global–Local Multi-Scale Information with Deep and Shallow Features. Remote Sens. 2025, 17, 1880. [Google Scholar] [CrossRef]
  67. Huang, W.; Deng, F.; Liu, H.; Ding, M.; Yao, Q. Multiscale semantic segmentation of remote sensing images based on edge optimization (MSEONet). IEEE Trans. Geosci. Remote Sens. 2025, 63, 5616813. [Google Scholar] [CrossRef]
  68. Gao, T.; Xia, S.; Liu, M.; Zhang, J.; Chen, T.; Li, Z. MSNet: Multi-Scale Network for Object Detection in Remote Sensing Images. Pattern Recognit. 2025, 158, 110983. [Google Scholar] [CrossRef]
  69. Tao, C.; Meng, Y.; Li, J.; Yang, B.; Hu, F.; Li, Y.; Cui, C.; Zhang, W. MSNet: Multispectral semantic segmentation network for remote sensing images. GIScience Remote Sens. 2022, 59, 1177–1198. [Google Scholar] [CrossRef]
  70. Guo, Y.; Zhank, Z.; Shang, Y.; Zhao, T.; Deng, S.; Yang, Y.; Yin, J. SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation. arXiv 2025. [Google Scholar] [CrossRef]
  71. Qu, Y.; Zhang, Z.; Xu, C.; Wan, Q.; Xie, M.; Chen, Y.; Liu, Z.; Zhong, Y. UniRSCD: A Unified Novel Architectural Paradigm for Remote Sensing Change Detection. arXiv 2025. [Google Scholar] [CrossRef]
  72. Wang, R.; Ma, L.; He, G.; Johnson, B.A.; Yan, Z.; Chang, M.; Liang, Y. Transformers for Remote Sensing: A Systematic Review and Analysis. Sensors 2024, 24, 3495. [Google Scholar] [CrossRef]
  73. Huang, Z.; Yan, H.; Zhan, Q.; Yang, S.; Zhang, M.; Zhang, C.; Lei, Y.; Liu, Z.; Liu, Q.; Wang, Y. A Survey on Remote Sensing Foundation Models: From Vision to Multimodality. arXiv 2025. [Google Scholar] [CrossRef]
  74. Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers (DINO). arXiv 2021. [Google Scholar] [CrossRef]
  75. Liang, D.; Peng, F.; Wu, B.; Cui, X.; Zhuang, H.; Zhang, G. FFSNet: Adaptive features fusion of foundation models and self-supervised models for remote sensing image segmentation. Digit. Signal Process. 2026, 168, 105634. [Google Scholar] [CrossRef]
Table 2. Decision-oriented comparison of methods for topographic object extraction.
Table 2. Decision-oriented comparison of methods for topographic object extraction.
Data
Modality
Object
Class
Method FamilyTypical
Input
Typical
Metrics
Performance
Range
Examples
Annotation EffortComputational CostGeneralization Ability
RGB/VHR imageryBuildingsCNN
(U-Net, FCN)
2D
images
IoU, F10.80–0.90HighMediumLow–Medium
RGB + DSMBuildingsCNN
+ height
fusion
2D
+ height
IoU, F10.85–0.93HighHighMedium
LiDAR point cloudBuildingsPointNet/PointCNN3D
points
F1,
completeness
0.88–0.95MediumHighHigh
RGBRoadsCNN2D
images
F1, IoU0.75–0.90HighMediumMedium
LiDARRoadsML (RF, SVM)Point
features
Precision,
recall
0.70–0.85MediumLowMedium
MultimodalForest roadsDL fusion models2D + 3DF1, IoU0.70–0.85HighHighHigh
Table 3. Method selection guide under practical constraints.
Table 3. Method selection guide under practical constraints.
Data & Project ConditionsRecommended Method FamilyRationaleRisks/Limitations
Only RGB images, small training setClassical ML (RF, SVM)Lower data
demand,
interpretable
Lower ceiling of accuracy
Only RGB images, large labeled datasetCNN (U-Net, DeepLab)Strong performance on 2D
patterns
Sensitive to
domain shift
RGB + DSM availableCNN + height fusionHeight helps
separate objects
DSM errors
propagate
LiDAR point cloud availablePoint-based DL
(PointNet++, KPConv)
Best for 3D
geometry
High
computational cost
Strong domain variability (urban + rural)Transformers/foundation modelsBetter
generalization
Requires large
pretraining
Lack of labeled dataSelf-supervised/pretrained
models
Reduces
annotation effort
Transfer learning needed
Need for fast operational mappingClassical ML or lightweight CNNLower
computational cost
Lower accuracy
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kryzia, K.; Radziejowska, A.; Adamczyk, J.; Kryzia, D. Analysis of Using Machine Learning Application Possibilities for the Detection and Classification of Topographic Objects. ISPRS Int. J. Geo-Inf. 2026, 15, 59. https://doi.org/10.3390/ijgi15020059

AMA Style

Kryzia K, Radziejowska A, Adamczyk J, Kryzia D. Analysis of Using Machine Learning Application Possibilities for the Detection and Classification of Topographic Objects. ISPRS International Journal of Geo-Information. 2026; 15(2):59. https://doi.org/10.3390/ijgi15020059

Chicago/Turabian Style

Kryzia, Katarzyna, Aleksandra Radziejowska, Justyna Adamczyk, and Dominik Kryzia. 2026. "Analysis of Using Machine Learning Application Possibilities for the Detection and Classification of Topographic Objects" ISPRS International Journal of Geo-Information 15, no. 2: 59. https://doi.org/10.3390/ijgi15020059

APA Style

Kryzia, K., Radziejowska, A., Adamczyk, J., & Kryzia, D. (2026). Analysis of Using Machine Learning Application Possibilities for the Detection and Classification of Topographic Objects. ISPRS International Journal of Geo-Information, 15(2), 59. https://doi.org/10.3390/ijgi15020059

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop