Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning

Faltin, Benedikt; Alani, Rosa; König, Markus

doi:10.3390/infrastructures11020048

Open AccessArticle

Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning

by

Benedikt Faltin

^*

,

Rosa Alani

and

Markus König

Department of Civil and Environmental Engineering, Ruhr University Bochum, Universitätsstrasse 150, 44801 Bochum, Germany

^*

Author to whom correspondence should be addressed.

Infrastructures 2026, 11(2), 48; https://doi.org/10.3390/infrastructures11020048

Submission received: 23 December 2025 / Revised: 26 January 2026 / Accepted: 28 January 2026 / Published: 31 January 2026

(This article belongs to the Special Issue Advances in Artificial Intelligence for Infrastructures)

Download

Browse Figures

Versions Notes

Abstract

Bridges are a crucial part of infrastructure, but many are in urgent need of maintenance. Digital methods like Building Information Modeling (BIM) and Digital Twinning can support this process but depend on digital models that are often missing for existing structures. Automating the reconstruction of these models from existing documentation, such as construction drawings, is essential to accelerate digital adoption. Addressing a key step in the reconstruction process, this paper presents an end-to-end pipeline for extracting bridge cross-sections from drawings. First, the YOLOv8 network locates and classifies the cross-sections within the drawing. The results are then processed by the segmentation model Segment Anything Model (SAM), which generates pixel-wise masks without requiring task-specific training data. This eliminates the need for manual mask annotation and enables straightforward adaptation to different cross-section types, making the approach broadly applicable in practice. Finally, a global optimization algorithm fits parametric templates to the masks, minimizing a custom loss function to extract geometric parameters. The pipeline is evaluated on 33 real-world drawings and achieves a median parameter deviation of −2.2 cm and 2.4 cm, with an average standard deviation of 35.4 cm.

Keywords:

as-built model generation; parametric modeling; bridge construction drawings; deep learning; semantic segmentation

1. Introduction

Transport infrastructure, including roads, railways, bridges, and ports, plays a central role in connecting regions and facilitating the efficient exchange of goods and services. The condition of this infrastructure directly impacts a country’s social and economic prosperity [1]. It is, therefore, particularly alarming that a considerable part of the transport infrastructure, especially bridges, in industrialized countries such as the United States [2] or Germany [3] is in poor condition due to aging. As a result, the functional reliability of these assets is affected, making timely and effective maintenance crucial.

Building Information Modeling (BIM) [4,5] and Digital Twins (DTs) [6,7] can enhance maintenance activities by improving inspection processes [8,9], supporting engineers and operators in their decision-making [10,11], and ultimately reducing maintenance costs [12,13]. However, the effective implementation of BIM and DT methods depends heavily on the availability of digital models of bridges [14], which are often unavailable for older bridges that are still in operation [15]. Instead, the only bridge data available are construction drawings, photographs, or Point Cloud (PC) data. Current practices involve manually reconstructing a digital bridge model from these data sources, a process that is both time-consuming and costly [16]. Such manual reconstruction poses a significant barrier to the adoption of BIM and DTs for bridge maintenance. In response, substantial research efforts focus on (semi-)automating the reconstruction of digital bridge models from existing data sources [17].

Among these data sources, PC data has become particularly prominent, with a significant body of research dedicated to it [18,19]. While PCs data provides precise geometric information about the components, solely utilizing them has several limitations: First, recording high-quality PC data for large structures such as bridges requires specialized equipment and expertise, making the process costly [20,21]. Second, a single PC can contain billions of individual points, placing high demands on computational resources [22]. Third, a PC only captures the geometry of visible components, meaning that subsoil structures, such as foundations, are not recorded. Finally, PC data may be incomplete due to obstructions, where parts of the structure are hidden by physical barriers or line-of-sight issues during data collection [18].

An alternative approach that overcomes these limitations is to reconstruct digital bridge models from construction drawings, which are often readily available [23] and contain comprehensive geometric and semantic information. However, geometry reconstructed only from as-planned drawings may not accurately reflect the as-built geometry. To account for potential deviations, PC data can be used afterward to verify and, if necessary, update the geometry of visible components, as demonstrated in Scan-vs-BIM approaches [24].

Despite some progress in reconstructing bridge models from drawings, research in this area remains limited, and no end-to-end process currently exists [17]. The authors previously proposed a three-stage workflow for reconstructing the bridge superstructure from drawings [25]. The process begins with the identification, scaling, and linking of the views contained in the construction drawings. Next, the three-dimensional curvature is reconstructed by extracting and combining the bridge axis with the elevation profile. Simultaneously, the two-dimensional cross-section is derived from the corresponding views. Finally, extruding the cross-section along the reconstructed curvature generates the complete superstructure.

While the authors’ previous work addressed identification, scaling, and linking, this study focuses on extracting the geometry of the cross-section. The proposed process, illustrated in Figure 1, begins by localizing and classifying the cross-section within a view using an object detection network. Subsequently, a segmentation network refines its outline at the pixel level. Conventional segmentation approaches require task-specific training and rely on time-consuming data collection and annotation [26]. In contrast, the segmentation network used here operates without task-specific training, enabling direct adaptation to varying cross-section types and drawing styles. The resulting pixel mask is simplified into a polygon representation, which forms the basis for extracting the geometric parameters. Therefore, a global optimization algorithm fits a parametric cross-section shape to the polygon. Finally, the extracted parameters are provided to a bridge modeling tool to reconstruct the superstructure geometry.

The key contributions of this study are:

To extract cross-sections from pixel-based drawings, we introduce an automated pipeline that integrates object detection, segmentation, and parametric templates.
We propose a semantic segmentation process that is independent of cross-section types and requires no task-specific data or additional model training.
We define a task-specific loss function that guides the optimization of parametric cross-section templates to accurately fit the extracted masks.

The rest of the paper is structured as follows: Section 2 reviews the application of Computer Vision (CV) methods in drawing analysis for generating digital building models. It also introduces parametric design and its role in bridge reconstruction. Section 3 describes the techniques used in the pipeline to extract the cross-sections. The pipeline is then tested on real-world bridge drawings, with the results summarized in Section 4. These findings are analyzed and discussed in Section 5, which also includes a further evaluation of each component’s contribution and the pipeline’s limitations. Finally, Section 6 concludes the paper and outlines directions for future research.

2. Background and Related Literature

This section reviews current research on reconstructing digital building models from drawings. For bridges, however, most existing work focuses on reconstruction from PC data, as discussed in Section 1. These approaches often rely on parametric bridge models that allow seamless integration into common bridge modeling tools. Consequently, the section also outlines the principles of parametric design and its relevance for bridge reconstruction. Based on this review, the main research gaps are identified and summarized.

2.1. Reconstruction from Drawings

While significant progress has been made in reconstructing bridges from PCs, the automatic generation of bridge models from drawings remains relatively unexplored [17]. Akanbi et al. [27,28] (see Erratum [29]) introduce a semi-automatic workflow for extracting geometric information from drawings. Their approach combines Optical Character Recognition (OCR), object detection, and image processing to derive bridge outlines from multiple views, which are then manually aligned to reconstruct a three-dimensional Industry Foundation Classes (IFC) model. While the authors present an end-to-end process, user input is still required at several stages. Poku-Agyemang and Reiterer [30] adopt a similar approach, using topological analysis, in which detected outlines are simplified with the Douglas–Peucker algorithm and subsequently extruded to obtain the full bridge geometry.

While these approaches reconstruct the bridge’s geometry, they do not capture its topology, resulting in models that lack explicit structural relationships. To address this limitation, Li et al. [31] extend Akanbi’s work by introducing manual component segmentation and enriching the model with semantic information, such as material properties extracted via OCR. Gölzhäuser et al. [32] build on this idea, employing the Handwritten Text Recognition model and logical reasoning to assign properties extracted from drawings to component types automatically.

Since drawings contain multiple views and significant noise such as text, intersecting lines, and artifacts, heuristic methods are often limited. As a result, recent research has increasingly shifted towards data-driven approaches. Mafipour et al. [33] apply You Only Look Once (YOLO) to identify views and components in bridge drawings and extract dimensional information using OCR. However, geometric reconstruction still requires manually transferring the parameters into a parametric bridge model.

In contrast to bridges, the reconstruction of high-rise building models from floor plans has been more extensively explored, particularly through data-driven approaches [26,34]. Several studies apply object detection models to extract geometric information from floor plans. Wang et al. [35] employ YOLOv3 to detect components such as doors and windows in pixel-based floor plans and convert them into a vectorized format. A decision tree then classifies the room types based on the detected layout. Zhao et al. [36,37] propose an alternative approach that leverages deep learning-based object detectors, i.e., YOLO and Faster R-CNN, to identify symbols in structural drawings, including grid markers and columns. The reconstructed columns are then exported as an IFC model. An elegant approach is proposed by Xing et al. [38], who reformulate the detection task from bounding box prediction to keypoint prediction. The keypoints represent salient component points, such as corners, enabling accurate reconstruction of elements with irregular shapes. A similar approach is proposed by Schönfelder et al. [39] for extracting the position of fire extinguishers from drawings to enrich a high-rise building model with fire safety equipment.

Alternatively, some studies apply semantic segmentation to extract geometric information from floor plans. Wu et al. [40] use Mask R-CNN to segment architectural elements and derive a consistent two-dimensional layout, which is subsequently extruded to generate a three-dimensional model. Expanding on this approach, Urbieta et al. [41] apply Mask R-CNN to multiple drawing types of the same building, including architectural plans, structural plans, and side views. By jointly exploiting these views, they extract both component geometry and vertical dimensions and consolidate the results into an IFC model. Yamasaki et al. [42] utilizes a Fully Connected (FC) network to segment floor plans into 17 categories, including structural components, such as walls, stairs, and windows, and room types, such as kitchens and bathrooms. Using DeepLabv3, Jang et al. [43] segment walls and doors and then vectorize the results for conversion into CITYGML. Some approaches parallelize the segmentation task by separating it into two branches: one for structural elements, such as walls and rooms, and another one for components, such as windows and doors. These methods typically use a shared encoder with task-specific heads. For instance, Zeng et al. [44] employ a VGG-based encoder, while Upadhyay et al. [45] use a U-Net-inspired variant with attention mechanisms. Finally, several studies [46,47,48] combine object detection and semantic segmentation to enhance floor plan analysis.

2.2. Parametric Bridge Design

In parametric design, an object’s geometry is defined indirectly through variable elements such as lines and circles, whose properties are controlled by parameters. These elements are linked by constraints and dependencies, including conditions like horizontality or parallelism. Once parameter values are assigned, the remaining values are automatically calculated based on these predefined constraints, updating the design accordingly [49,50]. This makes parametric geometry highly flexible and versatile. These foundations of parametric design were established in the early 1980s [51] and became widely adopted by most major software vendors by the mid-1990s [52]. Although these early applications were primarily developed for mechanical engineering, their potential for the Architecture, Engineering, and Construction (AEC) industry was recognized shortly afterwards [50]. One of the earliest applications of parametric design in architecture was presented by Burry [53], who explored how parametric methods could interpret Gaudí’s design for the Sagrada Familia church in Barcelona. Building on this foundation, subsequent research applied it to various architectural challenges, including engineering stair models [54], describing the geometry of Gothic windows [55], facilitating the design process for tall buildings [56], or even planning entire urban areas [57]. With the advent of BIM, parametric design began to receive even greater attention [58,59]. The combination of flexible geometry and structured data management led to the development of parametric building models, unlocking new possibilities for optimization. These models could now be leveraged to determine the optimal building design for structural load-bearing capacity [60], to minimize construction costs [61], or to enhance the sustainability of buildings [62,63,64].

Although early research focused primarily on high-rise buildings, parametric models offer significant advantages for bridges. Each bridge has a unique geometry shaped by site-specific constraints such as topography, yet the underlying design principles remain largely consistent across projects. By adjusting key parameters in existing parametric bridge models, designers can efficiently adapt them to new use cases, streamlining the design process. Consequently, parametric models have become the de facto standard in bridge engineering [65,66,67], supported by widely used modeling tools such as Allplan Bridge (https://www.allplan.com/de/produkte/allplan-bridge/ (accessed on 15 December 2025)), SOFiSTiK Bridge + Infrastructure Modeler (https://www.sofistik.com/en/products/bim-cad/bridge-infrastructure-modeler (accessed on 15 December 2025)), or Bentley OpenBridge Modeler (https://www.bentley.com/software/openbridge-modeler/ (accessed on 15 December 2025)).

Beyond design, parametric models are also widely used for bridge reconstruction, especially when PC data is used to create digital models of existing structures, as highlighted in [68,69]. In this context, Qin et al. [19] first apply a point histogram along the vertical axis to segment the PC into three main components: piers, tie beams, and pier caps. Based on this segmentation, sectional planes are automatically defined within each component, enabling the iterative extraction of cross-section parameters. Piers are modeled with circular cross-sections, while tie beams and pier caps are represented as rectangles. The extracted parameters form the foundation for reconstructing the bridge model in Dynamo and Revit.

A different study by Hu et al. [70] employs an end-to-end neural network to reconstruct cable-stayed bridges from drone images and corresponding PCs. Their approach uses a Recursive BiTree network to predict a binary tree with three types of nodes: split nodes, which divide components into smaller parts; similar nodes, which replicate parts using transformation parameters; and shape nodes, which parametrically define the geometry of the components. Shape nodes can represent cylinders, cuboids, or arbitrary forms, with the latter approximated using a voxel grid and refined through the Marching Cubes algorithm. The final bridge model is assembled by traversing the binary tree and combining the shape nodes.

However, these and similar approaches [71,72] are limited to basic geometric primitives, making them inadequate for capturing the complex geometries found in real-world bridges. To address this limitation, Lu and Brilakis [16] propose a heuristic reconstruction method based on pre-segmented PCs. Their approach follows the bridge design workflow by first estimating the horizontal axis and slicing the superstructure into cross-sections. Cross-sectional outlines and girder dimensions are extracted directly from the PC and matched to predefined girder templates. The resulting deck and girders are subsequently extruded along the axis to reconstruct the superstructure geometry. For the substructure, the approach builds on Qin et al. [19] by introducing a shape descriptor that determines the optimal geometric form for each component, rather than relying on predefined primitives.

A different approach is proposed by Mafipour et al. [73], addressing the complex shapes of bridges using parametric templates. Building upon their previous work [74,75], the authors define two-dimensional Parametric Prototype Models (PPMs) for various bridge components, such as cross-sections and wing walls. These PPMs are fitted to the pre-segmented PC using the Teaching-Learning-Based Optimization algorithm, which minimizes the shape difference between the PPM and the PC boundary. Once all components are parametrized, the overall bridge structure can be assembled. However, consistency is required since some components share parameters. For instance, the height of the wing wall must match the height of the adjacent abutment wall. To ensure this, the shared parameters are adjusted randomly until a configuration is found that minimizes the overall shape difference among all connected components.

2.3. Research Gaps

Overall, the literature review highlights a lack of research on reconstructing parametric bridge models from drawings. Several challenges remain:

There is a lack of research on bridge reconstruction from drawings, with most existing methods still relying heavily on manual input.
Current research does not utilize parametric bridge models for drawings, limiting the reuse of reconstruction methods across different bridge types.

To address these gaps, this study proposes an end-to-end process for automating the extraction of cross-section geometry from bridge drawings, utilizing object detection, semantic segmentation, and parametric templates.

3. Methodology

The proposed pipeline, as illustrated in Figure 1, aims to reconstruct the geometry of the bridge cross-section from pixel-based input data. It begins with a cross-sectional view in pixel format, segmented from a larger construction drawing. This segmentation step lies outside the scope of the pipeline, as it has already been addressed in prior work by Mafipour et al. [33] or Peng et al. [76]. Instead, the cross-sectional view can be manually cropped. Within the segmented view, an object detection model localizes bridge cross-sections and the corresponding cross-section types, as described in Section 3.1. The resulting bounding box then guides a segmentation step, explained in Section 3.2, where a pre-trained segmentation network generates a binary mask indicating which pixels belong to the cross-section. During post-processing, detailed in Section 3.3, this mask is converted into a polygon representation. Next, a parametric template is fitted to the polygon to reconstruct the cross-section geometry. A global optimization algorithm adjusts the template parameters by minimizing a loss function, as discussed in Section 3.4. These parameters can then be used in modeling tools such as Allplan Bridge to generate the cross-section geometry, as outlined in Section 3.5.

The code for the entire pipeline, including the trained weights, is provided at [77].

3.1. Cross-Section Detection

A single cross-sectional view may contain multiple bridge cross-sections, as is often the case in road bridges with separate superstructures for each direction of travel. Each cross-section requires separate parameter extraction. Because their positions can vary across the view, accurate detection is essential to ensure reliable processing in subsequent steps.

To enable individual detection, this study employs the object detection network YOLOv8 (https://github.com/ultralytics/ultralytics (accessed on 15 December 2025)), which was selected for its high detection accuracy, well-maintained documentation, and ease of use. As one of the latest versions in the YOLO model family, YOLOv8 builds on several architectural improvements. The original model, introduced by Redmon et al. [78], pioneered real-time object detection using a fully Convolutional Neural Network (CNN) that simultaneously predicted bounding boxes and class labels. Over time, the model has been refined through several key innovations: Anchor boxes were introduced to improve localization [79], mosaic data augmentation and Self-Adversarial Training (SAT) enhanced training efficiency [80], and compound scaling provided more effective network scaling [81].

The architecture of YOLOv8 is illustrated in Figure 2 and follows the conventional design of single-stage detection networks, consisting of three main components [82]: backbone, neck, and head. The backbone, based on a modified Darknet CNN, extracts visual features from the input image. These features are then processed by the neck, which fuses and refines them, acting as a bridge between the backbone and the head. In YOLOv8, this is implemented using a Path Aggregation Network (PANet), which enhances detection across different object scales. Finally, the head receives the fused features and makes predictions using two branches: one for object classification and one for bounding box regression. To support multi-scale detection, YOLOv8 employs three parallel detection heads, each operating at a different scale.

The model processes each drawing view as a complete image. Although patch-based processing is commonly applied to large-format drawings to improve the detection of small-scale objects [83,84,85], this strategy is not required in the present study. This is because the target cross-sections occupy a substantial portion of each drawing view, enabling reliable detection at full-view scale without spatial segmentation.

3.2. Cross-Section Segmentation

Using the bounding boxes predicted by the YOLOv8 model, semantic segmentation is applied to derive pixel-level masks of the cross-sections. Traditional segmentation networks such as Mask R-CNN [86] or YOLOv7-Mask [81] can detect and segment objects in a single pass. However, they require training datasets containing both bounding boxes and corresponding pixel-level masks. While bounding boxes can be annotated efficiently, producing accurate segmentation masks remains significantly more labor-intensive [26]. To overcome this limitation, this study adopts an alternative method that eliminates the need for time-consuming mask annotations by leveraging the pre-trained segmentation model Segment Anything Model (SAM) [87]. SAM’s zero-shot capability enables accurate segmentation of bridge cross-sections without any additional training. This approach significantly reduces development time and facilitates rapid adaptation to other cross-section types.

The architecture of the SAM model is depicted in Figure 3. SAM processes an image to predict object masks based on additional guidance. Because the model is agnostic to object classes, it requires an additional input, referred to as a prompt, to indicate which object in the image should be segmented. This prompt is processed alongside the image and can take various forms, including sparse inputs such as keypoints, bounding boxes, or text, as well as dense inputs like masks. In this study, the bounding boxes predicted by the YOLOv8 model serve as prompts. Preliminary tests using keypoints, specifically the center point of each bounding box, were also conducted, but they did not yield satisfactory results (cf. Section 4.5.1).

SAM consists of three main components: an image encoder, a prompt encoder, and a mask decoder. The image encoder generates feature embeddings from the input image using a pre-trained backbone based on the masked autoencoder framework [88], implemented with a Vision Transformer (ViT) architecture [89]. Notably, image encoding is performed only once per image, and the resulting embeddings can be reused for multiple object prompts.

The prompt encoder processes these input prompts using techniques tailored to their type. Geometric prompts, i.e., keypoints or bounding boxes, are encoded through positional encoding combined with learned embeddings. Bounding boxes are defined by the coordinates of their top-left and bottom-right corners. Each keypoint prompt includes coordinates and an associated label indicating whether the point belongs to the object. Text prompts are handled by the CLIP encoder [90], while mask prompts are processed using a custom CNN in combination with learned embeddings. Since this study does not involve text or mask prompts, these components are omitted in Figure 3.

Lastly, the mask decoder combines the image and prompt embeddings to generate the segmentation mask. It employs a transformer-based architecture with cross-attention mechanisms to fuse the two inputs and produce accurate predictions. However, prompt ambiguity can occur. For instance, a bounding box around an object may refer to the entire object or only a part of it. To account for such cases, SAM generates three candidate masks per prompt, each with an associated confidence score. In this study, only the mask with the highest confidence is processed.

3.3. Polygon Extraction and Processing

The masks generated by SAM provide the basis for parameter extraction. The parameter extraction process can be performed on both pixel-based masks and polygonal representations. This study employs polygonal representations, as they can be efficiently simplified to reduce memory requirements while preserving the underlying geometry. Accordingly, each cross-sectional mask is first converted into a polygon

P L G

using the algorithm proposed by Suzuki and Abe [91]. Mathematically, a polygon

P L G

is defined as an ordered set of vertices:

P L G = {v_{1}, v_{2}, \dots, v_{m}}, v_{i} \in R^{2}

(1)

where consecutive vertices

v_{i}

are connected by edges, and the polygon is closed by linking

v_{m}

back to

v_{1}

. The number of vertices m in the resulting polygon depends on the complexity of the original mask and can be large, increasing computational and memory demands. However, many of these vertices are redundant and can be removed without significantly affecting the overall shape. To improve efficiency, the Douglas–Peucker algorithm [92] is used to simplify the polygon. As shown in Figure 4, the simplification significantly reduces the number of vertices while preserving the overall shape of the original polygon.

3.4. Template-Based Parameter Extraction

To extract cross-section parameters, a parametric template is fitted to the polygon obtained from the segmentation mask. Since cross-sections can exhibit a wide variety of shapes, reflecting differences in structural function and design, multiple templates are defined, each tailored to approximate a specific cross-section type. Specifically, analysis of the dataset introduced in Section 4.1 identified three common cross-section types: a simple slab girder, a T-girder, and a tapered T-girder. However, the proposed set of cross-sections can be readily extended to include additional types, increasing the method’s flexibility. Each template is defined by a set of parameters listed in Table 1. Parameters are reused across templates to simplify design and maintenance without affecting the overall process. The cross-section types and their corresponding parametric templates are illustrated in Figure 5.

Because the templates can be freely positioned within the two-dimensional pixel space, parameters P1 and P2 define their horizontal and vertical placement, respectively, relative to the top-left corner. Parameters P3 to P5 control the heights of structural members such as the flange and web, while P6 to P8 specify their widths, thereby defining the shape of the cross-section. This study assumes that cross-sections are vertically symmetric and aligned without rotation. This assumption holds for most cases in the dataset; however, the approach does not depend on this restriction. The parameter set and associated templates can be extended to represent asymmetric or rotated cross-sections without requiring fundamental changes to the method.

To extract the true parameters of the depicted cross-section in the view, the template is adjusted to best match the shape derived from the segmentation mask. For any given configuration P1, P2, …, Pn the corresponding template polygon

P L G_{t p l}

is calculated as

PLG_tpl = f (P1, P2, …, Pn)

(2)

where f is the generator function that produces the polygonal template. The specific form of f, along with the number of parameters n and the number of vertices m in

P L G_{t p l}

, is determined by the chosen cross-section type. The geometric similarity between this polygon

P L G_{t p l}

and the simplified segmentation polygon

P L G_{s a m}

is then quantified to assess how well they match. Based on this, the direction of optimization can be inferred, that is, how the parameters should be adjusted to improve the match. A well-designed geometric similarity metric is, therefore, essential for achieving both efficient optimization and accurate reconstruction.

In this work, the Complete IoU (CIoU) loss [93] is used, as it captures relevant geometric factors: the overlap between the polygons, the distance between their centroids, and the ratio of their widths and heights. Furthermore, CIoU is invariant to resolution, ensuring consistent evaluation across cross-sectional views of varying scales. A perfectly aligned polygon pair yields a CIoU loss of zero, indicating complete geometric agreement. The equation used to compute the CIoU is:

L_{C I o U} (P L G_{s a m}, P L G_{t p l}) = 1 - I o U + \frac{d^{2}}{c^{2}} + α V

(3)

In this formula, Intersection over Union (IoU) measures the overlap between the two polygons, while the variables d, c,

α

, and V capture additional geometric relationships, with each term addressing a distinct aspect of the geometric similarity. Figure 6 provides an overview of these factors.

The first factor, overlap, is measured using the IoU:

I o U (P L G_{s a m}, P L G_{t p l}) = \frac{A_{i n t e r}}{A_{t p l} + A_{s a m} - A_{i n t e r}}

(4)

Here,

A_{i n t e r}

denotes the intersection area between the two polygons,

A_{t p l}

is the area of the parametric template polygon, and

A_{s a m}

is the area of the polygon derived from the segmentation mask. The IoU quantifies the extent to which the two polygons overlap, as visualized in Figure 6a. However, when the polygons do not intersect, IoU returns a score of zero, failing to distinguish between polygons that are closely positioned and those that are far apart. As a result, it provides no information on how the parameters should be adjusted to improve geometric similarity.

To address this limitation, the CIoU loss incorporates the distance between the polygons’ centroids, as initially introduced in the Distance IoU (DIoU) loss by Zheng et al. [94]. In Equation (3), the variables d and c represent this spatial relationship. The value d denotes the Euclidean distance between the centroids of the two polygons. To normalize this distance, c is defined as the length of the diagonal of the smallest enclosing axis-aligned bounding box containing both polygons, as illustrated in Figure 6b.

Despite the improvements introduced by the distance term, one limitation remains: When one polygon fully encloses the other and their centroids coincide, the centroid distance is near zero, and the IoU only suggests increasing the area of the enclosing polygon without indicating how to adjust its geometry, such as making it wider or taller. As a result, the optimization process lacks a clear direction for modifying the parameters to improve geometric similarity. To address this, Zheng et al. [93] introduced an additional penalty term based on the aspect ratio of the bounding boxes (cf. Figure 6c). This refinement is included in Equation (3) through the parameters

α

and V, defined in Equations (5) and (6), respectively:

V (P L G_{s a m}, P L G_{t p l}) = \frac{4}{π^{2}} {(a t a n (\frac{w^{t p l}}{h^{t p l}}) - a t a n (\frac{w^{s a m}}{h^{s a m}}))}^{2}

(5)

Here, w and h refer to the width and height of the minimum enclosing bounding boxes of the polygon derived from the segmentation and the template polygon.

α (P L G_{s a m}, P L G_{t p l}) = \frac{V}{(1 - I o U) + V}

(6)

The parameter

α

controls the influence of the aspect ratio penalty based on the current IoU score. However, as proposed by Zheng et al. [94], this term is applied only when the IoU exceeds a certain threshold (i.e., 0.5). This reflects the rationale that proportion differences become meaningful only when the polygons already share sufficient overlap. When they are far apart, spatial proximity becomes a more relevant factor. Therefore, in low-overlap cases, the CIoU loss effectively reduces to the DIoU loss.

To reduce the parameter search space during optimization, providing an initial guess within a plausible range is beneficial. This improves computational efficiency by reducing the effort needed to locate the loss minimum. In this study, the axis-aligned bounding box of

P L G_{s a m}

is used to initialize the parameters. A more advanced method for estimating the initial parameters was also evaluated; however, it did not improve convergence speed or result quality. A comparison of both approaches is presented in Section 4.5.2.

The bounding box-based approach was therefore adopted, with its coordinates used to estimate the template parameters as follows. The top-left corner of the bounding box provides the initial estimate for the offset parameters P1 and P2. For the slab girder, the height parameters P3 and P4 are assumed equal, each set to half the bounding box height. The flange width parameter P6 is initialized as one-eighth of the bounding box width, assuming that the total flange width accounts for one-quarter and is symmetrically distributed. The web width P7 is assumed to occupy the remaining three-quarters of the width.

For the T-girder, the web height P5 is set to half the bounding box height, with the total flange height accounting for the other half. Therefore, the flange height parameters, P3 and P4, are each set to one-quarter of the height. The web width P7 is estimated as half the bounding box width, while the remaining half is allocated to the flanges, making each flange width parameter P6 one-quarter of the total width.

The tapered T-girder builds on the assumptions used for the T-girder, maintaining the same height proportions for P3, P4, and P5. However, the width is divided differently: the web width P7 and flange width P6 each occupy one-quarter of the bounding box width, while the tapered web parameter P8 is assigned the remaining one-quarter, resulting in a value of one-eighth of the total width. An overview of the initial parameter estimates is presented in Table 2.

For each initial parameter guess, a search range of ±500 pixels is defined, assuming that the optimal configuration lies within these bounds. These ranges are passed to the optimization algorithm, which searches for the parameter values that minimize the loss function defined in Equation (3). The optimization is performed using the Dual Annealing optimization algorithm [95], as implemented in the SciPy package (https://scipy.org/ (accessed on 15 December 2025)). The template used for optimization is selected based on the cross-section classification produced by the YOLOv8 model. Consequently, introducing new cross-section template types requires retraining the YOLOv8 classifier. An alternative approach would fit all available template types to the segmentation mask and select the configuration that yields the best fit. In this approach, the YOLOv8 model would only be used to localize cross-sections, making the method independent of the number of supported template types. This design may simplify training and improve robustness, particularly when the set of templates is extended.

3.5. Model Geometry Generation

The extracted parameters describe the cross-section geometry but remain in pixel units, as all preceding steps operate on pixel-based images without accounting for resolution. To enable their use in parametric bridge modeling, these parameters must be converted into real-world dimensions. This conversion requires a scale factor that links pixels to real-world dimensions. Addressing this, Faltin et al. [96] proposed an approach to automatically determine the scale factor directly from the view. Applying this factor to the extracted parameters converts them into real-world dimensions.

The converted parameters serve as input for reconstructing the cross-section in a modeling environment. Among several available tools, this study selected Allplan Bridge due to its built-in support for script-based modeling. The scripts have been developed as part of this work to automate the generation process. These scripts take the extracted parameters as input and generate the corresponding bridge cross-sections. When combined with curvature data, they can also produce a full three-dimensional superstructure. The resulting models can be exported in the IFC format for integration into downstream workflows. The scripts are provided in Appendix A and available at [77].

4. Experiments

This section evaluates the performance of the proposed pipeline and its individual components. It begins with an overview of the dataset used to train the detection network and to test all stages of the pipeline. Section 4.2 presents a detailed analysis of the object detection results for the cross-section detection step. Next, Section 4.3 examines the semantic segmentation performance of the pre-trained SAM network. Following that, Section 4.4 assesses the performance of the complete pipeline in extracting geometric parameters. Finally, Section 4.5 explores alternative approaches for selected components to evaluate their potential for further improving pipeline performance.

4.1. Dataset

In this study, a dataset of 102 bridge drawings is compiled from public tenders issued by German authorities. The drawings originate from a wide range of engineering offices and depict different bridge types, cross-sections, and graphical styles, resulting in a highly heterogeneous dataset. All drawings are manually annotated using the CVAT labeling platform (https://github.com/cvat-ai/cvat (accessed on 15 December 2025)).

As described in Section 3, the pipeline operates only on cross-sectional views. However, the source drawings include a variety of perspectives, such as longitudinal, top-down, and cross-sectional views. Consequently, the first step in the annotation workflow involves identifying the view type in each drawing and labeling it accordingly. Only the cross-sectional views are used to train the cross-section detection network and to evaluate all components of the pipeline.

Within each cross-sectional view, the individual bridge cross-sections are annotated. The dataset includes a broad spectrum of cross-section shapes, as depicted in Figure 7. Some bridges exhibit simple shapes, such as slabs or single T-shaped girders, while others feature more complex multi-girder arrangements with an arbitrary number of girders. For these multi-girder configurations, each girder is annotated separately based on the template types defined in Section 3.4. As a result, each girder is treated as an individual instance throughout the pipeline, which can introduce spatial inconsistencies between adjacent girders, such as gaps or overlaps. These inconsistencies must be resolved in a post-processing step to enable reconstruction of the complete multi-girder cross-section. This can be achieved using strategies similar to those proposed by Mafipour et al. [73], although such steps lie outside the scope of this study.

Following this labeling strategy, the dataset comprises 347 cross-sections distributed across 186 cross-sectional views, as summarized in Table 3. Of the 186 cross-sectional views, 37 (approximately 20%) are reserved for testing, while the remaining 149 are allocated to training and validation. Specifically, 120 views are used for training and 29 for validation.

Within each cross-sectional view, the three cross-section types, i.e., slab girder, T-girder, and tapered T-girder, are labeled. In total, the dataset contains 62 slab girders, 197 T-girders, and 88 tapered T-girders. A detailed breakdown is provided in Table 4.

In the original dataset, the distribution of cross-section types exhibits a pronounced class imbalance. To mitigate this and to increase the diversity of the training and validation data, additional synthetic images are generated. Since some of the drawings are available in PDF format, they can be imported into AutoCAD (https://www.autodesk.com/de/products/autocad/overview (accessed on 15 December 2025)), where the lines are automatically vectorized. To create synthetic variants, the appearance and geometry of the cross-sections are systematically modified, as shown in Figure 8. This includes adjusting hatch styles, replacing them with solid fills, changing the type and spacing of hatch lines, and altering the overall shape of the cross-section. The modified drawings are exported, manually annotated, and added to the dataset. This process yields 100 synthetic cross-sectional views, with 80 added to the training set and 20 to the validation set. As summarized in Table 4, the inclusion of synthetic data partially reduces the observed class imbalance.

Lastly, as demonstrated in previous studies [97,98,99], incorporating empty images into the training data for object detection improves robustness and helps reduce overfitting. Exposing the detection model to images without relevant objects enables it to distinguish between meaningful and irrelevant regions more effectively. Following this, 200 negative examples containing no cross-sections are added to the training set. To evaluate the performance of the segmentation and parameter extraction steps, each cross-section in the test dataset is manually annotated with both a segmentation mask and the corresponding ground-truth parameter values.

4.2. Detection Results

For training, the YOLOv8m model is used, as it offers the best trade-off between detection accuracy and computational efficiency based on preliminary experiments. To increase data diversity and enhance model generalization, several data augmentation techniques are applied. These include geometric transformations such as translation, flipping, and rotation, as well as color modifications like changes in saturation and hue. A complete overview of the applied augmentation strategies, including their probabilities and value settings, is provided in Table 5.

The YOLOv8m network was trained on a single NVIDIA A100-SXM4 GPU with a batch size of 32 to maximize GPU utilization. Early stopping was applied with a patience of 100 epochs, based on performance on the validation dataset, to prevent overfitting. The input image size was set to 1024 because the detection network processes entire cross-sectional views, which benefit from higher spatial resolution. Preliminary experiments indicated that this resolution provides a suitable trade-off between detection performance and computational efficiency. All other hyperparameters followed the default YOLO training configuration (https://docs.ultralytics.com/ (accessed on 15 December 2025)). The training concluded after 370 epochs, with the best-performing model weights obtained at epoch 270 and a total training time of 44 min. The training and validation loss curves are shown in Figure 9.

The model is evaluated on the test dataset using the mean Average Precision (mAP) and mean Average Recall (mAR) metrics, following the COCO evaluation protocol [100]. The results are summarized in Table 6.

The model achieves an overall mAP@0.5:0.95 of 0.729 and a corresponding mAR@0.5:0.95 of 0.735. At an IoU threshold of 0.50, the mAP peaks at 0.819, while at the stricter threshold of 0.75 it reaches 0.798. At the class level, T-girders achieve the strongest results, with mAP@0.5:0.95 values above 0.88 and recall exceeding 0.90. Slab girders achieve lower but still robust performance, with an mAP@0.5:0.95 of 0.686. Tapered T-girders exhibit the lowest performance, with mAP@0.5:0.95 and mAR@0.5:0.95 both around 0.61, indicating greater detection difficulty due to geometrically varying cross-sections. Representative detection results are shown in Figure 10.

For a more detailed evaluation, the Precision–Recall Curves (P-R Curves) for different settings is calculated using the COCO analyze function (https://cocodataset.org (accessed on 15 December 2025)). The resulting curves are depicted in Figure 11.

The C75 and C50 curves correspond to IoU thresholds of 0.75 and 0.50, respectively, and illustrate the impact of bounding box precision on performance. Across all three cross-section types, the C50 and C75 curves show only moderate separation. A noticeable difference appears only for the slab girder, where the mAP increases from 0.764 at an IoU threshold of 0.5 to 0.827 at 0.75. This pattern indicates that performance is not strongly constrained by bounding box alignment. This conclusion is further supported by the Loc curve, which applies a relaxed IoU threshold of 0.1 and accepts loosely aligned bounding boxes as correct detections. Despite this, only minor performance gains, if any, are observed across all classes.

A substantially larger improvement emerges when class confusion is ignored, as reflected by the Oth curve. For the slab girder, the mAP increases from 0.827 to 0.846, while for the T-girder it rises from 0.943 to 0.990. No improvement is observed for the tapered T-girder, indicating that class confusion does not affect this cross-section type.

The BG curve excludes false positives, that is, detections without overlap with any ground truth object, and yields only marginal improvements. This result indicates low confusion between background regions and cross-sections. In contrast, the FN curves show the largest improvements. However, this effect depends on the cross-section type and is most pronounced for slab girders and tapered T-girders, where entire cross-sections are frequently omitted, whereas regular T-girders show no measurable improvement in the FN curve.

Overall, the analysis reveals clear potential to improve false-negative rates, particularly for the tapered T-girder and the slab girder. This limitation directly reflects the underlying class imbalance, as T-girders dominate the dataset. Although synthetic images were introduced to enrich the training data, the imbalance could not be fully resolved. Further performance gains, therefore, require additional real training data for the underrepresented cross-section types. Alternatively, loss reweighting strategies such as focal loss or sampling-based approaches may mitigate the imbalance.

4.3. Object Mask Construction

To assess the segmentation performance of the SAM network, two input configurations were evaluated using the ViT-H SAM model. First, manually annotated ground truth bounding boxes were used to prompt SAM, estimating its upper-bound performance under ideal input conditions. Second, bounding boxes predicted by the YOLOv8m network served as input to simulate a realistic end-to-end scenario. The segmentation masks produced in both setups were compared to quantify the effect of detection quality on overall performance. The evaluation was based on manually annotated cross-section masks in the test images. All remaining parameters were kept at their default values, demonstrating that the model can be applied to the considered task without additional tuning or task-specific adjustments.

The results are summarized in Table 7 and representative segmentation outputs are shown in Figure 12. When ground truth bounding boxes are used, the segmentation model achieves a mAP@.5:.95 of 0.561. At fixed thresholds of 0.50 and 0.75, the mAP rises to 0.894 and 0.603, respectively. The model achieves a mAR@.5:.95 of 0.611 in this setup.

When using predicted bounding boxes from YOLOv8m, segmentation performance declines across all thresholds. The mAP@0.5:0.95 decreases by approximately 36% to 0.358, indicating a notable drop in segmentation quality due to detection inaccuracies. Similarly, the mAP@0.5 falls by 17% to 0.743, and the map@0.75 drops by 31% to 0.418. The mAR@0.5:0.95 declines to 0.459, representing a 25% decrease compared to the setup with ground truth bounding boxes. Nevertheless, the results remain promising, suggesting that the SAM network maintains reasonable robustness even when localization is imperfect.

4.4. Parameter Extraction

This section evaluates the end-to-end performance of the complete pipeline. Accuracy is assessed by comparing the extracted cross-section parameters to manually determined reference values. These reference values were obtained by manually measuring the corresponding parameters in the pixel-based views in the test dataset. For parameters that may differ between the left and right sides of a cross-section, such as P6 and P8, asymmetry can lead to two distinct measurements. In these cases, the reference value is defined as the arithmetic mean of the two measurements, consistent with the parameter representation used within the pipeline. For every view and parameter, the error is computed as the difference between the prediction and the reference. The results are summarized in Figure 13 and Figure 14, where each swarmplot shows the error distribution of a specific parameter, aggregated over all views.

In each swarmplot, the green line indicates the median error. The box spans the Interquartile Range (IQR), extending from the lower quartile to the upper quartile, and thus covers the central 50% of the data. The whiskers extend to the smallest and largest values within 1.5 times the IQR from the lower and upper quartiles, respectively. Values outside this range are considered outliers.

Figure 13 displays the evaluation results in pixel values. Overall, the extracted parameters closely match the ground truth on average. However, some parameters exhibit significant variability and outliers. Parameters P1, P6, and P7 show a broad spread in their error distributions compared to more stable parameters such as P3. In particular, P1 and P6 display wide error ranges, with values extending from –243 to +391 pixels. P8 follows a slightly different pattern, with a tightly grouped box and whiskers but a few extreme outliers reaching down to 244 pixels. In contrast, P2, along with P3, P4, and P5, demonstrates high precision, characterized by narrow distributions and minimal outliers. The median pixel error ranges from −7.9 pixels for P8 to 10.7 pixels for P1. The average standard deviation across all parameters is 56.15 pixels, indicating a robust pipeline performance.

While the pixel-based evaluation provides a general impression of the parameter accuracy, it does not reflect differences in image resolution across the dataset. To enable comparability and improve interpretability, the errors were converted into real-world units using the scale defined for each view. Although the scale can be extracted automatically using the method proposed in [96], this study determines the scale manually for each view to avoid introducing additional sources of error into the evaluation. Across all views, the average resolution is 0.0168 m per pixel. This value is reported for reference only, as all conversions rely on the view-specific scale. The scaled evaluation results are presented in Figure 14.

P1 and P6 remain the least precise parameters. The maximum and minimum errors now range from 312 cm to −229 cm. However, for both parameters, the width of the box and whiskers decreases, indicating that over half of the predictions lie within a narrower and more accurate range. Interestingly, P7 and P8, which showed a large pixel-space spread, appear more consistent after scaling. The remaining parameters, P2 to P5, continue to demonstrate high precision. Overall, the median error remains relatively stable, now ranging from −2.4 cm for P4 to 2.2 cm for P5. The standard deviation decreases to 35.4 cm, indicating a reduced variability of the prediction errors. The achieved accuracy is comparable to drawing-based reconstruction approaches such as Akanbi and Zhang [27] and falls within the range reported for PC-based fitting methods, such as Mafipour et al. [73] or Lu and Brilakis [16].

The pipeline’s computational performance was evaluated. Processing a single drawing required, on average, 34 s, corresponding to approximately 8 s per cross-section. Of this time, about 7 s were spent on the optimization step, while all remaining processing stages required less than 100 milliseconds. This indicates that parallelizing the fitting process could substantially reduce the overall runtime.

4.5. Attempts to Enhance Performance

Although the pipeline shows strong overall performance, the following section investigates whether alternative configurations of individual components can further improve the results and examines each component’s impact on overall performance.

4.5.1. Keypoint-Based Prompting for SAM

As described in Section 3.2, the SAM network supports both bounding boxes and keypoints as input prompts. In the current pipeline, segmentation masks are generated solely based on bounding boxes predicted by YOLOv8m. However, this strategy heavily depends on the precision of the bounding boxes.

To assess whether keypoint-based prompting can enhance segmentation quality, two alternative strategies are evaluated. In both cases, the prompts are derived from the bounding boxes predicted by the YOLOv8 detection model. In the first one, SAM is prompted with both the bounding box and a keypoint placed at the bounding box center. In the second, only the center keypoint of these predictions is used. Representative results for both approaches are shown in Figure 15.

Overall, combining the bounding box with a center keypoint reduces segmentation performance compared to the results in Table 6. Specifically, mAP@0.75 decreases by 15% from 0.510 to 0.432, map@0.5:0.95 drops by 5% from 0.437 to 0.416, and mar@0.5:0.95 declines by 9% from 0.482 to 0.439. In contrast, mAP@0.5 increases by 15% from 0.728 to 0.838. Using only the center keypoint to prompt SAM results in a substantial decline in segmentation quality. All values drop below 0.1, indicating that keypoints alone do not provide sufficient guidance for reliable mask generation. In summary, using only the predicted bounding boxes yields the most reliable segmentation performance.

4.5.2. Shape-Based Parameter Initialization

This section explores an alternative strategy for estimating initial parameter values that more accurately reflect the actual mask geometry and offer a reliable starting point for optimization. As outlined in Section 3.4, the current method derives these values from the bounding box enclosing the polygon. While simple, this method can introduce bias.

To address this, the proposed strategy analyzes the polygon’s geometry directly. It identifies horizontal and vertical edges, which serve as the basis for estimating the initial cross-section parameters. Each edge’s orientation is determined by calculating its angle from its coordinates using standard trigonometric functions. Edges are classified as vertical if their angle deviates by no more than 35° from the vertical axis and as horizontal if the deviation from the horizontal axis does not exceed 15°. These tolerances account for typical inclinations found in cross-sections, such as slightly sloped web sides, as illustrated in Figure 5 To ensure meaningful output only edges longer than 10 pixels are considered. The classified edges are then sorted according to the coordinates of their endpoints, vertical edges from left to right, and horizontal edges from top to bottom. An example with classified edges is shown in Figure 16.

The classified edges are used to estimate initial parameter values based on a set of logical rules derived from patterns observed in the dataset. These rules vary slightly depending on the template type.

The offset parameters P1 and P2 are determined from the top corner of the leftmost vertical edge. The height of this edge provides the initial estimate for the flange height P3.

For T-girder and tapered T-girder cross-sections, estimating the web height P5 requires identifying the edge representing the side of the web. Assuming vertical symmetry, it is sufficient to locate the left web side. This edge is expected to lie between the leftmost vertical edge and the horizontal center of the cross-section, defined as the midpoint between the leftmost and rightmost vertical edges. Within this region, the algorithm selects the longest vertical edge, and its height is used as the estimate for P5. The remaining vertical parameter, P4, is computed by subtracting P3 and P5 from the total cross-section height. This total height corresponds to the vertical distance between the topmost and bottom-most horizontal edges. For slab cross-sections, the same procedure applies, except that P5 is set to zero.

Next, the horizontal parameters are estimated. The web width, P7, is defined as the length of the longest horizontal edge located near the bottom of the cross-section. For slab and T-girder cross-sections, the flange overhang P6 is computed by subtracting P7 from the total width of the cross-section, measured as the distance between the leftmost and rightmost vertical edges. Assuming vertical symmetry, the result is divided by two. In tapered T-girder cross-sections, the inclined web width P8 must also be considered. It is derived from the width of the vertical web side edge identified during the estimation of P5. In this case, P6 is estimated by subtracting both P7 and twice the value of P8 from the total width, followed by division by two.

To evaluate the effect of the proposed polygon-based initialization strategy, it replaces the original method described in Section 3.4, while all other components of the pipeline remain unchanged. Parameter estimation results are then compared to the default configuration in pixel values. Median errors are reported in Table 8, and standard deviations are shown in Table 9.

The median errors for the default and polygon-based initialization are partially similar. Notable differences occur only for parameters P1 (5.88 pixels compared to 6.04 pixels), P5 (2.53 pixels compared to 3.76 pixels), and P7 (2.76 pixels compared to 3.01 pixels), where the default configuration achieves slightly higher accuracy. A difference is also observed for P6, where the default variant yields a median error of −3.14 pixels, which is slightly worse than the polygon-based initialization at −2.74 pixels.

The standard deviations reveal clear differences between the two initialization strategies. For nearly all parameters except P2 and P8, the default setup exhibits lower variability, indicating more consistent performance. This is particularly evident for P7, where the polygon-based approach performs significantly worse, nearly doubling the standard deviation from 68.36 to 132.97. In contrast, P8 benefits strongly from the polygon-based initialization, with its standard deviation decreasing from 71.51 to 11.63. This indicates that the default initialization of P8 is unreliable. However, when considering all parameters, the default strategy yields more consistent results overall.

4.5.3. Impact of Loss Factors

To evaluate the individual contribution of each loss factor (cf. Figure 6) to overall parameter precision, two modified loss setups are tested while keeping the rest of the pipeline unchanged. The first uses only the IoU term during optimization, while the second incorporates the DIoU loss. The resulting errors are compared to those obtained using the full CIoU loss. Median errors are presented in Table 8, and standard deviations are shown in Table 9.

For the median errors, all loss configurations yield nearly identical results for most parameters. Notable differences occur only in P4 and P6, where the IoU-only loss slightly outperforms the others, and in P5, where the DIoU improves accuracy marginally.

In contrast, the standard deviation results show greater variability. For P1 and P2, the IoU-only loss achieves the lowest variability, outperforming the default configuration by approximately 2.98 pixels and 0.25 pixels, respectively. For P4, P5, and P6, the DIoU loss reduces standard deviation by up to 3.78 pixels compared to the default setup. In contrast, the default configuration yields the most consistent results for P3 and P7, with improvements of up to 5.93 pixels over the alternatives.

These findings indicate that the effect of the distance and aspect ratio factors included in the CIoU loss is limited. While they occasionally enhance precision, they may also introduce minor degradations. Nevertheless, the full CIoU loss is preferred due to its improved convergence speed during optimization, as shown in [93].

5. Discussion

5.1. Reflections

This study introduces an effective machine learning-based approach for reconstructing cross-section shapes from pixel-based drawings, thereby contributing a key component for automated bridge reconstruction, as proposed in [25]. The study evaluates the performance of the proposed pipeline through detailed experiments, showing promising results across the individual components for detection, segmentation, and parameter extraction.

In particular, the YOLOv8-based detection component identifies cross-sections with high precision. However, as shown by the analysis curve in Figure 11, it fails to detect all instances for specific cross-section types, leading to limited recall. This highlights the need for more extensive and diverse training data, as complete detection is critical.

Further analysis of the cross-section segmentation component shows that mask quality depends on the accuracy of the bounding boxes. Despite this, SAM can produce reliable masks even when the input bounding boxes are imperfect.

Finally, the analysis of the parameter extraction component demonstrates that most parameters are estimated with high accuracy, especially when evaluated in real-world dimensions. However, certain parameters, most notably P1, exhibit significant error variability. A closer examination reveals that five specific views out of the total of 33 views account for the majority of this deviation. Excluding the five affected views from the evaluation significantly improves overall performance, with the largest remaining errors ranging from −124.3 cm to 132.8 cm, as shown in Figure 17. The average median error drops significantly to just 0.2 cm, and the average standard deviation decreases to 17.5 cm.

Two main factors explain the poor performance in these outlier cases. First, object detection occasionally fails to separate adjacent cross-sections, either merging them into a single instance or inaccurately localizing their boundaries. This may result from the absence of a visual boundary between adjacent cross-sections in the drawing. The resulting detection error propagates through subsequent processing stages and directly affects the quality of the segmentation mask, as illustrated by the left example in Figure 18. Consequently, the fitted parametric template yields distorted geometric parameters.

Second, segmentation errors may occur even when object detection provides accurate bounding boxes. The right example in Figure 18 shows cases in which the segmentation network fails to recover the true cross-section boundaries despite correct localization. Although colored elements and hatch patterns appear frequently in these failure cases, no consistent correlation between such visual characteristics and segmentation accuracy can be established. This suggests that the observed segmentation errors primarily reflect model limitations rather than specific characteristics of the drawings.

A detailed causal analysis of these errors remains limited, since both detection and segmentation rely on neural networks that are not directly interpretable. This limitation is particularly relevant in an engineering context, where inaccuracies may propagate into subsequent modeling steps and affect downstream results. In principle, this limitation can be addressed in two ways. One option is the use of inherently interpretable modeling approaches, as proposed in related contexts [101,102]. Alternatively, post hoc interpretability techniques, such as methods from Explainable Artificial Intelligence (XAI), may support the analysis of model behavior and error sources [103,104].

Despite these limitations, outliers occur only in a small fraction of the evaluated drawings. In the majority of cases, the pipeline produces reliable results, demonstrating its practical feasibility and establishing a solid foundation for future improvements. However, the evaluation relies on a limited test dataset that does not fully represent the diversity of cross-sections encountered in practice. Therefore, the method requires project-specific validation before implementation.

5.2. Shortcomings

Several limitations remain that should be addressed in future work. Most notably, the set of supported template types is restricted to three cross-section shapes, excluding many commonly used geometries. The method also assumes that all cross-sections are horizontally aligned and vertically symmetric, restricting applicability to idealized configurations and excluding inclined or asymmetric designs commonly found in practice. Moreover, the method does not cover steel-concrete composite or fully steel bridges. These structures typically comprise thin plates, which are difficult to segment accurately.

For multi-girder cross-sections, each girder is optimized as an individual shape. However, in the final representation, these girders must be merged into a single unified cross-section. This requires resolving spatial inconsistencies such as overlaps or gaps to ensure geometric coherence.

5.3. Future Work

Several directions for future research may enhance the pipeline’s performance and extend its applicability. One promising avenue involves integrating domain knowledge to improve the reliability of individual components, as suggested by Hakert et al. [105]. In particular, the influence of bounding box quality on segmentation performance represents a relevant research direction. A systematic analysis of how variations in bounding box position and extent affect the resulting masks could inform strategies for improving mask generation and correction. Independent of bounding box quality, a second research direction concerns the refinement of segmentation results. Here, the SAM mask prediction process could be further investigated, for example, by leveraging its mask prompting mechanism to iteratively refine initial masks and improve boundary accuracy.

In addition, future work may address geometric consistency across adjacent cross-sections in multi-girder configurations. In the current pipeline, each girder is optimized independently, which can lead to local overestimation or underestimation of individual parameters. Post-processing strategies that enforce shared constraints, such as overall girder width or alignment, could mitigate these effects.

Finally, the fitting stage itself offers further potential for improvement. While the framework can, in principle, accommodate more general templates, each additional degree-of-freedom increases the likelihood of unstable optimization, even when parameter bounds are enforced. One promising mitigation strategy is the use of iterative optimization schemes that gradually increase model complexity. These schemes first optimize a reduced set of parameters and subsequently activate additional parameters within progressively restricted search spaces. Comparable strategies are well established in PC fitting and reconstruction [106,107], making their transfer to parametric cross-section reconstruction a relevant direction for future research.

Although the approach is currently tailored to bridge cross-sections, the underlying principles are transferable. Similar workflows could be applied to other infrastructure domains, such as tunnels or roads. Furthermore, the class-agnostic nature of SAM opens opportunities for segmentation tasks in architectural floor plans and other design documents.

6. Conclusions

This work addresses a critical gap in the automation of digital bridge modeling by introducing a pipeline that reconstructs cross-section geometries directly from pixel-based construction drawings. The approach integrates object detection, semantic segmentation, and parametric modeling into a fully automated workflow. A YOLOv8 model detects cross-section regions within the cross-sectional views and classifies their respective type, followed by segmentation using SAM, which operates without task-specific training. The resulting masks are simplified into polygons, from which geometric parameters are extracted through a loss-guided optimization process that fits predefined parametric templates. These parameters provide the basis for generating three-dimensional models.

Evaluation of the approach on a diverse set of real-world drawings demonstrates that the pipeline performs reliably across a range of different cross-section types. However, some limitations remain that point to opportunities for future improvement. For example, the current approach does not support steel or composite bridges and does not yet ensure consistent geometry in multi-girder configurations.

By addressing these limitations and enhancing the learning-based components by integrating expert knowledge, the proposed approach can significantly streamline the creation of digital models from existing construction drawings. This, in turn, supports the broader adoption of BIM and DT technologies for existing infrastructure.

Author Contributions

B.F.: Conceptualization, data curation, investigation, methodology, software, writing—original draft, validation, visualization; R.A.: Data curation, software; M.K.: Supervision, resources, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research as part of the project “BIMKIT–Bestandsmodellierung von Gebäuden und Infrastrukturbauwerken Mittels KI zur Generierung von Digital Twins” (https://www.bimkit.eu/ (accessed on 15 December 2025)) is funded by the German Federal Ministry for Economic Affairs and Energy (BMWE) under grant number 01MK21001J and is supported by the German Aerospace Center (DLR), Cologne. Further support was provided through the project “HyBridGen–KI-basierter Brückengenerator mit Wissens- und Erfahrungsdaten und früher Bürgerbeteiligung”, funded by the German Federal Ministry of Transport (BMV) as part of the mFUND initiative under grant number 19FS2067A. The project is managed by the German Aerospace Center (DLR). Additional funding was received from the Deutsche Forschungsgemeinschaft (DFG) through the project “Generierung von modellbasierten Ausführungsdetails für Bestandsbrücken mit Hilfe von Informed Machine Learning” under project number 517437997. The authors are responsible for the content of this publication.

Data Availability Statement

All code, trained models, and corresponding model weights used in this study are publicly available at https://doi.org/10.5281/zenodo.18008105 (accessed on 27 January 2026). The datasets used in this work are not publicly available due to licensing and usage restrictions.

Acknowledgments

The authors want to express their gratitude to Markus Scheffer and Felix Kretschmann, who contributed to the dataset in the form of drawings.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

All code used in this study is also published in [77]. Three types of TCL scripts enable the automatic model creation in Allplan Bridge based on the extracted parameters. Listing 1 lists the optimized parameter values in real-world units, serving as the input for reconstructing the cross-section geometry. Parameters P1 and P2, which define the position of each cross-section in the drawing, are omitted as the file describes only a single cross-section.

Listing 1: The script variables.tcl contains the extracted parameter values in real-world units.

# 0: Slab girder, 1: T-girder or

# 2: Tapered T-girder

set template_type 2

# Parameters

set P3 0.35000

set P4 0.50000

set P5 1.50000

set P6 2.50000

set P7 8.00000

set P8 0.40000

For each cross-section type, slab girder (Listing 2), T-girder (Listing 3), and tapered T-girder (Listing 4), a corresponding script is provided to generate the respective geometry. Both the variable definition files and the cross-section scripts are referenced in a higher-level bridge script. Although this bridge-level script, which includes curvature and substructure elements, lies outside the scope of this paper, an example is available in [77].

Listing 2: This script uses the parameters provided in variables.tcl to create slab girder cross-section.

CSECTIONS BEGIN

CSECTION "Slab girder"

TEXT ""

CVARS BEGIN

VAR "P3" $P3 LENGTH ""

VAR "P4" $P4 LENGTH ""

VAR "P6" $P6 LENGTH ""

VAR "P7_h" [expr {$P7/2}] LENGTH ""

CVARS END

CLINES BEGIN

ZAXIS "Zloc" 0.00000 0.00000

YAXIS "Yloc" 0.00000 90.00000

PARALLEL "L1" "P7_h" POS LINE "Yloc"

PARALLEL "L2" "P7_h" NEG LINE "Yloc"

PARALLEL "L3" "P6" POS LINE "L1"

PARALLEL "L4" "P6" NEG LINE "L2"

PARALLEL "L7" "P3" NEG LINE "Zloc"

PARALLEL "L8" "P4" NEG LINE "L7"

CLINES END

CBOUNDARIES BEGIN

BOUNDARY "Boundary Line 1"

POINTS BEGIN

BPOINT 1 LSECT "L3" "Zloc"

BPOINT 2 LSECT "L4" "Zloc"

BPOINT 3 LSECT "L4" "L7"

BPOINT 4 LSECT "L2" "L8"

BPOINT 5 LSECT "L1" "L8"

BPOINT 6 LSECT "L3" "L7"

POINTS END

BOUNDARY END

CBOUNDARIES END

CUNITS BEGIN

SBEAM 1 LSECT "Zloc" "Yloc"

SBEAM 1 BOUNDARY "Boundary Line 1"

CUNITS END

CSECTION END

CSECTIONS END

Listing 3: This script uses the parameters provided in variables.tcl to create T-girder cross-section.

CSECTIONS BEGIN

CSECTION "T-girder"

TEXT ""

CVARS BEGIN

VAR "P3" $P3 LENGTH ""

VAR "P4" $P4 LENGTH ""

VAR "P5" $P5 LENGTH ""

VAR "P6" $P6 LENGTH ""

VAR "P7_h" [expr {$P7/2}] LENGTH ""

CVARS END

CLINES BEGIN

ZAXIS "Zloc" 0.00000 0.00000

YAXIS "Yloc" 0.00000 90.00000

PARALLEL "L1" "P7_h" POS LINE "Yloc"

PARALLEL "L2" "P7_h" NEG LINE "Yloc"

PARALLEL "L3" "P6" POS LINE "L1"

PARALLEL "L4" "P6" NEG LINE "L2"

PARALLEL "L7" "P3" NEG LINE "Zloc"

PARALLEL "L8" "P4" NEG LINE "L7"

PARALLEL "L9" "P5" NEG LINE "L8"

CLINES END

CBOUNDARIES BEGIN

BOUNDARY "Boundary Line 1"

POINTS BEGIN

BPOINT 1 LSECT "L3" "Zloc"

BPOINT 2 LSECT "L4" "Zloc"

BPOINT 3 LSECT "L4" "L7"

BPOINT 4 LSECT "L2" "L8"

BPOINT 5 LSECT "L2" "L9"

BPOINT 6 LSECT "L1" "L9"

BPOINT 7 LSECT "L1" "L8"

BPOINT 8 LSECT "L3" "L7"

POINTS END

BOUNDARY END

CBOUNDARIES END

CUNITS BEGIN

SBEAM 1 LSECT "Zloc" "Yloc"

SBEAM 1 BOUNDARY "Boundary Line 1"

CUNITS END

CSECTION END

CSECTIONS END

Listing 4: This script uses the parameters provided in variables.tcl to create tapered T-girder cross-section.

CSECTIONS BEGIN

CSECTION "Tapered T-girder"

TEXT ""

CVARS BEGIN

VAR "P3" $P3 LENGTH ""

VAR "P4" $P4 LENGTH ""

VAR "P5" $P5 LENGTH ""

VAR "P6" $P6 LENGTH ""

VAR "P7_h"[expr {$P7/2}] LENGTH ""

VAR "P8" $P8 LENGTH ""

CVARS END

CLINES BEGIN

ZAXIS "Zloc" 0.00000 0.00000

YAXIS "Yloc" 0.00000 90.00000

PARALLEL "L1" "P7_h" POS LINE "Yloc"

PARALLEL "L2" "P7_h" NEG LINE "Yloc"

PARALLEL "L3" "P8" POS LINE "L1"

PARALLEL "L4" "P8" NEG LINE "L2"

PARALLEL "L5" "P6" POS LINE "L3"

PARALLEL "L6" "P6" NEG LINE "L4"

PARALLEL "L7" "P3" NEG LINE "Zloc"

PARALLEL "L8" "P4" NEG LINE "L7"

PARALLEL "L9" "P5" NEG LINE "L8"

CLINES END

CBOUNDARIES BEGIN

BOUNDARY "Boundary Line 1"

POINTS BEGIN

BPOINT 1 LSECT "L5" "Zloc"

BPOINT 2 LSECT "Zloc" "Yloc"

BPOINT 3 LSECT "L6" "Zloc"

BPOINT 4 LSECT "L6" "L7"

BPOINT 5 LSECT "L4" "L8"

BPOINT 6 LSECT "L2" "L9"

BPOINT 7 LSECT "L9" "Yloc"

BPOINT 8 LSECT "L1" "L9"

BPOINT 9 LSECT "L3" "L8"

BPOINT 10 LSECT "L5" "L7"

POINTS END

BOUNDARY END

CBOUNDARIES END

CUNITS BEGIN

SBEAM 1 LSECT "Zloc" "Yloc"

SBEAM 1 BOUNDARY "Boundary Line 1"

CUNITS END

CSECTION END

CSECTIONS END

References

Zhang, Y.; Cheng, L. The Role of Transport Infrastructure in Economic Growth: Empirical Evidence in the UK. Transp. Policy 2023, 133, 223–233. [Google Scholar] [CrossRef]
American Society of Civil Engineers (ASCE). A Comprehensive Assessment of America’s Infrastructure. 2025. Available online: https://infrastructurereportcard.org/ (accessed on 23 June 2025).
BASt-Bundesanstalt für Straßen-und Verkehrswesen. Bridge Statistics. 2025. Available online: https://www.bast.de/DE/Ingenieurbau/Fachthemen/brueckenstatistik/bruecken_hidden_node.html (accessed on 23 June 2025).
Borrmann, A.; König, M.; Koch, C.; Beetz, J. Building Information Modeling: Why? What? How? In Building Information Modeling; Springer International Publishing: Cham, Switzerland, 2018; pp. 1–24. [Google Scholar] [CrossRef]
Volk, R.; Stengel, J.; Schultmann, F. Building Information Modeling (BIM) for existing buildings—Literature review and future needs. Autom. Constr. 2014, 38, 109–127. [Google Scholar] [CrossRef]
Khajavi, S.H.; Motlagh, N.H.; Jaribion, A.; Werner, L.C.; Holmstrom, J. Digital Twin: Vision, Benefits, Boundaries, and Creation for Buildings. IEEE Access 2019, 7, 147406–147419. [Google Scholar] [CrossRef]
Boje, C.; Guerriero, A.; Kubicki, S.; Rezgui, Y. Towards a semantic Construction Digital Twin: Directions for future research. Autom. Constr. 2020, 114, 103179. [Google Scholar] [CrossRef]
Al-Shalabi, F.A.; Turkan, Y.; Laflamme, S. BrIM implementation for documentation of bridge condition for inspection. In Proceedings of the 5th International/11th Construction Specialty Conference, Vancouver, BC, Canada, 7–10 June 2015; pp. 1–8. [Google Scholar] [CrossRef]
Saback, V.; Popescu, C.; Blanksvärd, T.; Täljsten, B. Asset Management of Existing Concrete Bridges Using Digital Twins and BIM: A State-of-the-Art Literature Review. Nord. Concr. Res. 2022, 66, 91–111. [Google Scholar] [CrossRef]
Stipanovic, I.; Palic, S.S.; Casas, J.R.; Chacón, R.; Ganic, E. Inspection and Maintenance KPIs to Support Decision Making Integrated into Digital Twin Tool. ce/papers 2023, 6, 1234–1241. [Google Scholar] [CrossRef]
Sun, Z.; Liang, B.; Liu, S.; Liu, Z. Data and Knowledge-Driven Bridge Digital Twin Modeling for Smart Operation and Maintenance. Appl. Sci. 2024, 15, 231. [Google Scholar] [CrossRef]
Costin, A.; Adibfar, A.; Hu, H.; Chen, S.S. Building Information Modeling (BIM) for Transportation Infrastructure—Literature Review, Applications, Challenges, and Recommendations. Autom. Constr. 2018, 94, 257–281. [Google Scholar] [CrossRef]
Byun, N.; Han, W.S.; Kwon, Y.W.; Kang, Y.J. Development of BIM-Based Bridge Maintenance System Considering Maintenance Data Schema and Information System. Sustainability 2021, 13, 4858. [Google Scholar] [CrossRef]
Jiang, F.; Ma, L.; Broyd, T.; Chen, K. Digital Twin and Its Implementations in the Civil Engineering Sector. Autom. Constr. 2021, 130, 103838. [Google Scholar] [CrossRef]
Xu, Y.; Turkan, Y. BrIM and UAS for Bridge Inspections and Management. Eng. Constr. Archit. Manag. 2019, 27, 785–807. [Google Scholar] [CrossRef]
Lu, R.; Brilakis, I. Digital Twinning of Existing Reinforced Concrete Bridges from Labelled Point Clusters. Autom. Constr. 2019, 105, 102837. [Google Scholar] [CrossRef]
Schönfelder, P.; Aziz, A.; Faltin, B.; König, M. Automating the retrospective generation of As-is BIM models using machine learning. Autom. Constr. 2023, 152, 104937. [Google Scholar] [CrossRef]
Rashidi, M.; Mohammadi, M.; Sadeghlou Kivi, S.; Abdolvand, M.M.; Truong-Hong, L.; Samali, B. A Decade of Modern Bridge Monitoring Using Terrestrial Laser Scanning: Review and Future Directions. Remote Sens. 2020, 12, 3796. [Google Scholar] [CrossRef]
Qin, G.; Zhou, Y.; Hu, K.; Han, D.; Ying, C. Automated Reconstruction of Parametric BIM for Bridge Based on Terrestrial Laser Scanning Data. Adv. Civ. Eng. 2021, 2021, 8899323. [Google Scholar] [CrossRef]
Wang, Q.; Tan, Y.; Mei, Z. Computational Methods of Acquisition and Processing of 3D Point Cloud Data for Construction Applications. Arch. Comput. Methods Eng. 2020, 27, 479–499. [Google Scholar] [CrossRef]
Schatz, Y.; Domer, B. 7Semi-Automated Creation of IFC Bridge Models from Point Clouds for Maintenance Applications 0. Front. Built Environ. 2024, 10, 1375873. [Google Scholar] [CrossRef]
Xu, Y.; Stilla, U. Toward Building and Civil Infrastructure Reconstruction From Point Clouds: A Review on Data and Key Techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2857–2885. [Google Scholar] [CrossRef]
Byun, Y.; Sohn, B.S. ABGS: A System for the Automatic Generation of Building Information Models from Two-Dimensional CAD Drawings. Sustainability 2020, 12, 6713. [Google Scholar] [CrossRef]
Lin, S.; Duan, L.; Jiang, B.; Liu, J.; Guo, H.; Zhao, J. Scan vs. BIM: Automated Geometry Detection and BIM Updating of Steel Framing through Laser Scanning. Autom. Constr. 2025, 170, 105931. [Google Scholar] [CrossRef]
Faltin, B.; Schönfelder, P.; Gann, D.; König, M. Reconstructing As-Built Beam Bridge Geometry from Construction Drawings Using Deep Learning-Based Symbol Pose Estimation. Adv. Eng. Informatics 2024, 62, 102808. [Google Scholar] [CrossRef]
Pizarro, P.N.; Hitschfeld, N.; Sipiran, I.; Saavedra, J.M. Automatic Floor Plan Analysis and Recognition. Autom. Constr. 2022, 140, 104348. [Google Scholar] [CrossRef]
Akanbi, T.; Zhang, J. Framework for Developing IFC-Based 3D Documentation from 2D Bridge Drawings. J. Comput. Civ. Eng. 2022, 36, 04021031. [Google Scholar] [CrossRef]
Akanbi, T.; Zhang, J. Semi-automated generation of 3D bridge models from 2D PDF bridge drawings. In Proceedings of the Construction Research Congress, Arlington, Virginia, 9–12 March 2022; pp. 1347–1354. [Google Scholar] [CrossRef]
Akanbi, T.; Zhang, J. Erratum for “Framework for Developing IFC-Based 3D Documentation from 2D Bridge Drawings” by Temitope Akanbi and Jiansong Zhang. J. Comput. Civ. Eng. 2022, 36, 08222001. [Google Scholar] [CrossRef]
Poku-Agyemang, K.N.; Reiterer, A. 3D Reconstruction from 2D Plans Exemplified by Bridge Structures. Remote Sens. 2023, 15, 677. [Google Scholar] [CrossRef]
Li, H.; Yang, F.; Zhang, J. IFC-Based Semantic Segmentation and Semantic Enrichment of BIM for Bridges. In Proceedings of the Construction Research Congress 2024, Des Moines, Iowa, 20–23 March 2024; pp. 597–606. [Google Scholar] [CrossRef]
Gölzhäuser, P.; Peng, M.; Jäkel, J.I.; Klemt-Albert, K.; Marx, S. Approach to Generate a Simple Semantic Data Model from 2D Bridge Plans Using AI-based Text Recognition. In Proceedings of the 33rd European Safety and Reliability Conference, Southampton, UK, 3–7 September 2023; pp. 3006–3013. [Google Scholar] [CrossRef]
Mafipour, M.S.; Ahmed, D.; Vilgertshofer, S.; Borrmann, A. Digitalization of 2D bridge drawings using deep learning models. In Proceedings of the 30th Int. Conference on Intelligent Computing in Engineering (EG-ICE), London, UK, 4–7 July 2023. [Google Scholar]
Feist, S.; Jacques De Sousa, L.; Sanhudo, L.; Poças Martins, J. Automatic Reconstruction of 3D Models from 2D Drawings: A State-of-the-Art Review. Eng 2024, 5, 784–800. [Google Scholar] [CrossRef]
Wang, W.; Dong, S.; Zou, K.; Li, W.s. Room Classification in Floor Plan Recognition. In Proceedings of the 4th International Conference on Advances in Image Processing, Sanya, China, 1–3 January 2021; pp. 48–54. [Google Scholar] [CrossRef]
Zhang, Y.; Zhu, M.; Zhang, Q.; Shen, T.; Zhang, B. An Approach of Line Vectorization of Engineering Drawings Based on Semantic Segmentation. In Proceedings of the 2022 IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), Chengdu, China, 16–19 December 2022; pp. 1447–1453. [Google Scholar] [CrossRef]
Zhang, W.; Joseph, J.; Yin, Y.; Xie, L.; Furuhata, T.; Yamakawa, S.; Shimada, K.; Kara, L.B.; Guo, T.; Liu, Y. Component Segmentation of Engineering Drawings Using Graph Convolutional Networks. Comput. Ind. 2023, 147, 103885. [Google Scholar] [CrossRef]
Xing, J.; Wu, L.; Zeng, T.; Wu, Y.; Shang, J. Comprehensive Floor Plan Vectorization with Sparse Point Set Representation. Autom. Constr. 2025, 173, 106023. [Google Scholar] [CrossRef]
Schönfelder, P.; Aziz, A.; Bosché, F.; König, M. Enriching BIM Models with Fire Safety Equipment Using Keypoint-Based Symbol Detection in Escape Plans. Autom. Constr. 2024, 162, 105382. [Google Scholar] [CrossRef]
Wu, Y.; Shang, J.; Chen, P.; Zlatanova, S.; Hu, X.; Zhou, Z. Indoor Mapping and Modeling by Parsing Floor Plan Images. Int. J. Geogr. Inf. Sci. 2021, 35, 1205–1231. [Google Scholar] [CrossRef]
Urbieta, M.; Urbieta, M.; Laborde, T.; Villarreal, G.; Rossi, G. Generating BIM Model from Structural and Architectural Plans Using Artificial Intelligence. J. Build. Eng. 2023, 78, 107672. [Google Scholar] [CrossRef]
Yamasaki, T.; Zhang, J.; Takada, Y. Apartment Structure Estimation Using Fully Convolutional Networks and Graph Model. In Proceedings of the 2018 ACM Workshop on Multimedia for Real Estate Tech, Yokohama, Japan, 11 June 2018; pp. 1–6. [Google Scholar] [CrossRef]
Jang, H.; Yu, K.; Yang, J. Indoor Reconstruction from Floorplan Images with a Deep Learning Approach. ISPRS Int. J. Geo-Inf. 2020, 9, 65. [Google Scholar] [CrossRef]
Zeng, Z.; Li, X.; Yu, Y.K.; Fu, C.W. Deep Floor Plan Recognition Using a Multi-Task Network With Room-Boundary-Guided Attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9095–9103. [Google Scholar] [CrossRef]
Upadhyay, A.; Dubey, A.; Kuriakose, S.M. FPNet: Deep Attention Network for Automated Floor Plan Analysis. In Proceedings of the Document Analysis and Recognition—ICDAR 2023 Workshops, San José, CA, USA, 24–26 August 2023; Volume 14193, pp. 163–176. [Google Scholar] [CrossRef]
Dodge, S.; Xu, J.; Stenger, B. Parsing Floor Plan Images. In Proceedings of the Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 358–361. [Google Scholar] [CrossRef]
Lv, X.; Zhao, S.; Yu, X.; Zhao, B. Residential Floor Plan Recognition and Reconstruction. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–21 June 2021; pp. 16712–16721. [Google Scholar] [CrossRef]
Surikov, I.Y.; Nakhatovich, M.A.; Belyaev, S.Y.; Savchuk, D.A. Floor Plan Recognition and Vectorization Using Combination UNet, Faster-RCNN, Statistical Component Analysis and Ramer-Douglas-Peucker. In Proceedings of the Computing Science, Communication and Security, Gujarat, India, 26–27 March 2020; Volume 1235, pp. 16–28. [Google Scholar] [CrossRef]
Anderl, R.; Mendgen, R. Parametric Design and Its Impact on Solid Modeling Applications. In Proceedings of the Third ACM Symposium on Solid Modeling and Applications, Salt Lake City, UT, USA, 17–19 May 1995; pp. 1–12. [Google Scholar] [CrossRef]
Monedero, J. Parametric Design: A Review and Some Experiences. Autom. Constr. 2000, 9, 369–377. [Google Scholar] [CrossRef]
Lin, V.C.; Gossard, D.C.; Light, R.A. Variational geometry in computer-aided design. Comput. Graph. 1981, 15, 171–177. [Google Scholar] [CrossRef]
Shah, J.J. Designing with Parametric CAD: Classification and comparison of construction techniques. In Proceedings of the International Workshop on Geometric Modelling, Tokyo, Japan, 7–9 December 1998; pp. 53–68. [Google Scholar]
Burry, M. Parametric Design and the Sagrada Familia. Archit. Res. Q. 1996, 1, 70–81. [Google Scholar] [CrossRef]
Mitossi, V.; Koutamanis, A. Parametric design of stairs. In Proceedings of the 3rd Design and Decision Support Systems in Architecture and Urban Planning Conference. Part One: Architecture Proceedings, Spa, Belgium, 18-21 August 1996. [Google Scholar]
Havemann, S.; Fellner, D. Generative Parametric Design of Gothic Window Tracery. In Proceedings of the Shape Modeling Applications, Genova, Italy, 7–9 June 2004; pp. 350–353. [Google Scholar] [CrossRef]
Park, S.M.; Elnimeiri, M.; Sharpe, D.C.; Krawczyk, R.J. Tall Building Form Generation by Parametric Design Process. In Proceedings of the CTBUH 2004 Seoul Conference, Seoul, Republic of Korea, 10–13 October 2004; pp. 10–13. [Google Scholar]
Steinø, N.; Veirum, N.E. A Parametric Approach to Urban Design. In Proceedings of the eCAADe 2005: Digital Design: The Quest for New Paradigms, Lisbon, Portugal, 21–24 September 2005; pp. 679–686. [Google Scholar] [CrossRef]
Zeng, X.; Tan, J. Building Information Modeling Based on Intelligent Parametric Technology. Front. Archit. Civ. Eng. China 2007, 1, 367–370. [Google Scholar] [CrossRef]
Santos, R.; Costa, A.A.; Grilo, A. Bibliometric Analysis and Review of Building Information Modelling Literature Published between 2005 and 2015. Autom. Constr. 2017, 80, 118–136. [Google Scholar] [CrossRef]
Goli, A.; Alaghmandan, M.; Barazandeh, F. Parametric Structural Topology Optimization of High-Rise Buildings Considering Wind and Gravity Loads. J. Archit. Eng. 2021, 27, 04021038. [Google Scholar] [CrossRef]
Gan, V.J.; Wong, C.; Tse, K.; Cheng, J.C.; Lo, I.M.; Chan, C. Parametric Modelling and Evolutionary Optimization for Cost-Optimal and Low-Carbon Design of High-Rise Reinforced Concrete Buildings. Adv. Eng. Informatics 2019, 42, 100962. [Google Scholar] [CrossRef]
Hollberg, A.; Ruth, J. LCA in Architectural Design—a Parametric Approach. Int. J. Life Cycle Assess. 2016, 21, 943–960. [Google Scholar] [CrossRef]
Tam, V.W.; Zhou, Y.; Illankoon, C.; Le, K.N. A Critical Review on BIM and LCA Integration Using the ISO 14040 Framework. Build. Environ. 2022, 213, 108865. [Google Scholar] [CrossRef]
Touloupaki, E.; Theodosiou, T. Optimization of Building Form to Minimize Energy Consumption through Parametric Modelling. Procedia Environ. Sci. 2017, 38, 509–514. [Google Scholar] [CrossRef]
Girardet, A.; Boton, C. A Parametric BIM Approach to Foster Bridge Project Design and Analysis. Autom. Constr. 2021, 126, 103679. [Google Scholar] [CrossRef]
Khoshamadi, N.; Banihashemi, S.; Poshdar, M.; Abbasianjahromi, H.; Tabadkani, A.; Hajirasouli, A. Parametric and Generative Mechanisms for Infrastructure Projects. Autom. Constr. 2023, 154, 104968. [Google Scholar] [CrossRef]
Wu, S.; Ramli, M.Z.; Ngian, S.P.; Qiao, G.; Jiang, B. Review on Parametric Building Information Modelling and Forward Design Approaches for Sustainable Bridge Engineering. Discov. Appl. Sci. 2025, 7, 127. [Google Scholar] [CrossRef]
Yalcinkaya, M.; Singh, V. Patterns and Trends in Building Information Modeling (BIM) Research: A Latent Semantic Analysis. Autom. Constr. 2015, 59, 68–80. [Google Scholar] [CrossRef]
Jäkel, J.I.; Heinlein, E.; Gölzhäuser, P.; Kellner, M.; Klemt-Albert, K.; Reiterer, A. How to Digitise Bridge Structures—A Systematic Review of the Status Quo for Creating Digital BIM Models of Existing Bridge Structures in the Operational Phase. Infrastructures 2025, 10, 47. [Google Scholar] [CrossRef]
Hu, F.; Zhao, J.; Huang, Y.; Li, H. Structure-aware 3D Reconstruction for Cable-stayed Bridges: A Learning-based Method. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 89–108. [Google Scholar] [CrossRef]
Lee, J.H.; Park, J.J.; Yoon, H. Automatic Bridge Design Parameter Extraction for Scan-to-BIM. Appl. Sci. 2020, 10, 7346. [Google Scholar] [CrossRef]
Hu, K.; Han, D.; Qin, G.; Zhou, Y.; Chen, L.; Ying, C.; Guo, T.; Liu, Y. Semi-Automated Generation of Geometric Digital Twin for Bridge Based on Terrestrial Laser Scanning Data. Adv. Civ. Eng. 2023, 2023, 6192001. [Google Scholar] [CrossRef]
Mafipour, M.S.; Vilgertshofer, S.; Borrmann, A. Automated Geometric Digital Twinning of Bridges from Segmented Point Clouds by Parametric Prototype Models. Autom. Constr. 2023, 156, 105101. [Google Scholar] [CrossRef]
Mafipour, M.S.; Vilgertshofer, S.; Borrmann, A. Deriving Digital Twin Models of Existing Bridges from Point Cloud Data Using Parametric Models and Metaheuristic Algorithms. In Proceedings of the EG-ICE 2021 Workshop on Intelligent Computing in Engineering, Berlin, Germany, 30 June–2 July 2021; Volume 464. [Google Scholar]
Mafipour, M.S.; Vilgertshofer, S.; Borrmann, A. Creating Digital Twins of Existing Bridges through AI-based Methods. In Proceedings of the IABSE Symposium, Prague 2022: Challenges for Existing and Oncoming Structures, Prague, Czech Republic, 25–27 May 2022; pp. 727–734. [Google Scholar] [CrossRef]
Peng, M.; Cui, A.; Kang, C.; Marx, S. Main Title: AI-Based Extraction and Management of Text and View Information from 2D Bridge Engineering Drawings. In Proceedings of the EG-ICE 2025: International Workshop on Intelligent Computing in Engineering, Glasgow, UK, 1–3 July 2025. [Google Scholar] [CrossRef]
Faltin, B. Zenodo, Version 2.0; RUB CrossSectAI: Bochum, Germany, 2025.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2022. [Google Scholar] [CrossRef]
Yaseen, M. What Is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024. [Google Scholar] [CrossRef]
Vilgertshofer, S.; Stoitchkov, D.; Borrmann, A.; Menter, A.; Genc, C. Recognising Railway Infrastructure Elements in Videos and Drawings Using Neural Networks. Proc. Inst. Civ. Eng. Smart Infrastruct. Constr. 2019, 172, 19–33. [Google Scholar] [CrossRef]
Rezvanifar, A.; Cote, M.; Albu, A.B. Symbol Spotting on Digital Architectural Floor Plans Using a Deep Learning-based Framework. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 568–569. [Google Scholar] [CrossRef]
Gupta, M.; Wei, C.; Czerniawski, T. Automated Valve Detection in Piping and Instrumentation (P&ID) Diagrams. In Proceedings of the 39th International Symposium on Automation and Robotics in Construction, Bogota, Colombia, 12–15 July 2022; pp. 630–637. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. arXiv 2023. [Google Scholar] [CrossRef]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. arXiv 2021. [Google Scholar] [CrossRef]
Nguyen, D.K.; Assran, M.; Jain, U.; Oswald, M.R.; Snoek, C.G.M.; Chen, X. An Image Is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels. arXiv 2025. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
Suzuki, S.; Be, K. Topological structural analysis of digitized binary images by border following. Comput. Vision, Graph. Image Process. 1985, 30, 32–46. [Google Scholar] [CrossRef]
Douglas, D.H.; Peucker, T.K. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartogr. Int. J. Geogr. Inf. Geovisualization 1973, 10, 112–122. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv 2021. [Google Scholar] [CrossRef] [PubMed]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Xiang, Y.; Gong, X.G. Efficiency of Generalized Simulated Annealing. Phys. Rev. E 2000, 62, 4473–4476. [Google Scholar] [CrossRef]
Faltin, B.; Schönfelder, P.; König, M. Towards a Robust Deep Learning-Based Scale Inference Approach in Construction Drawings. In Proceedings of the Computing in Civil Engineering 2023, Corvallis, Oregon, 25–28 June 2024; pp. 721–728. [Google Scholar] [CrossRef]
Gao, L.; He, Y.; Sun, X.; Jia, X.; Zhang, B. Incorporating Negative Sample Training for Ship Detection Based on Deep Learning. Sensors 2019, 19, 684. [Google Scholar] [CrossRef]
Bayer, H.; Faltin, B.; König, M. Automated Extraction of Bridge Gradient from Drawings Using Deep Learning. In Proceedings of the 23rd International Conference on Construction Applications of Virtual Reality, Florence, Italy, 13–16 November 2023; Volume 137, pp. 683–690. [Google Scholar] [CrossRef]
Nogueira-Rodríguez, A.; Glez-Peña, D.; Reboiro-Jato, M.; López-Fernández, H. Negative Samples for Improving Object Detection—A Case Study in AI-Assisted Colonoscopy for Polyp Detection. Diagnostics 2023, 13, 966. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the 2014 European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar] [CrossRef]
Naumets, S.; Lu, M. Investigation into Explainable Regression Trees for Construction Engineering Applications. J. Constr. Eng. Manag. 2021, 147, 04021084. [Google Scholar] [CrossRef]
Zhao, H.; Zhang, X.; Ding, Y.; Guo, T.; Li, A.; Soh, C.K. Probabilistic mixture model driven interpretable modeling, clustering, and predicting for physical system data. Eng. Appl. Artif. Intell. 2025, 160, 112069. [Google Scholar] [CrossRef]
Xu, F.; Uszkoreit, H.; Du, Y.; Fan, W.; Zhao, D.; Zhu, J. Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges. In Proceedings of the International Conference on Natural Language Processing and Chinese Computing (NLPCC 2019), Dunhuang, China, 9–14 October 2019; Volume 11839, pp. 563–574. [Google Scholar] [CrossRef]
Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; et al. Explainable AI (XAI): Core Ideas, Techniques, and Solutions. ACM Comput. Surv. 2023, 55, 1–33. [Google Scholar] [CrossRef]
Hakert, A.; Schönfelder, P. Informed Machine Learning Methods for Instance Segmentation of Architectural Floor Plans. In Proceedings of the 33. Forum Bauinformatik, München, Germany, 7–9 September 2022; pp. 395–403. [Google Scholar]
Duan, D.Y.; Qiu, W.G.; Cheng, Y.J.; Zheng, Y.C.; Lu, F. Reconstruction of shield tunnel lining using point cloud. Autom. Constr. 2021, 130, 103860. [Google Scholar] [CrossRef]
Ye, Y.; Yi, R.; Gao, Z.; Zhu, C.; Cai, Z.; Xu, K. NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-View Images. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 8486–8495. [Google Scholar] [CrossRef]

Figure 1. Overall process to extract the two-dimensional geometry of the cross-section from pixel-based drawings.

Figure 2. A simplified representation of the YOLOv8 architecture, consisting of a backbone, a neck, and the detection heads.

Figure 3. A simplified representation of the SAM architecture, consisting of the three main components: image encoder, prompt encoder, and mask decoder. The image encoder translates the input image into image tokens, which are processed by the mask decoder. Based on input prompts, the mask decoder predicts object masks. In this study, only bounding box and keypoint prompts are used. Bounding boxes, provided in xyxy format, are processed through a dedicated prompt encoder. Keypoints, given in xy format, include an associated label l indicating whether the point lies inside or outside the object.

Figure 4. Polygon simplification using the Douglas-Peucker algorithm. From top left to bottom right, the number of vertices is reduced from 1000 to 250, 100, and finally 50, with no perceptible loss of shape.

Figure 5. Illustration of the three cross-section types addressed in this study, shown from left to right: slab girder, T-girder, and tapered T-girder. The top row shows the original drawings and the bottom row shows the templates with parameters. The assumed axis of symmetry is indicated at the center of each cross-section.

Figure 6. Illustration of the three geometric factors that define the CIoU loss. CIoU incorporates the area overlap between the parametric template polygon and the SAM polygon, the distance between their centroids, and the alignment of their width-height ratios. Each subfigure highlights one of these terms. (a) Intersection area

A_{i n t e r}

(highlighted in red) between the parametric template polygon

P L G_{t p l}

(blue) and the SAM polygon

P L G_{s a m}

(green). This term encourages the shapes to occupy the same region in space by maximizing their intersection area. (b) Euclidean distance d between the centroids of the parametric template polygon

P L G_{t p l}

(blue) and the SAM polygon

P L G_{s a m}

(green). The diagonal length c of the smallest enclosing axis-aligned bounding box normalizes this distance, making it independent of scale. (c) Comparison of width and height between the parametric template polygon

P L G_{t p l}

(blue) and the SAM polygon

P L G_{s a m}

(green). The ratio alignment term penalizes differences in aspect ratio to ensure geometric consistency.

Figure 6. Illustration of the three geometric factors that define the CIoU loss. CIoU incorporates the area overlap between the parametric template polygon and the SAM polygon, the distance between their centroids, and the alignment of their width-height ratios. Each subfigure highlights one of these terms. (a) Intersection area

A_{i n t e r}

(highlighted in red) between the parametric template polygon

P L G_{t p l}

(blue) and the SAM polygon

P L G_{s a m}

(green). This term encourages the shapes to occupy the same region in space by maximizing their intersection area. (b) Euclidean distance d between the centroids of the parametric template polygon

P L G_{t p l}

(blue) and the SAM polygon

P L G_{s a m}

(green). The diagonal length c of the smallest enclosing axis-aligned bounding box normalizes this distance, making it independent of scale. (c) Comparison of width and height between the parametric template polygon

P L G_{t p l}

(blue) and the SAM polygon

P L G_{s a m}

(green). The ratio alignment term penalizes differences in aspect ratio to ensure geometric consistency.

Figure 7. Examples of labeled cross-sections showcasing various forms and types of girders in the dataset.

Figure 8. Examples of synthetic cross-sectional images generated from the original view shown at the top left. Variations include geometry, color, or hatch patterns.

Figure 9. Training and validation loss curves over epochs for the YOLOv8m network.

Figure 10. Sample detection results for cross-sections on the test dataset. The top row shows the ground truth annotations, while the bottom row displays the corresponding predictions produced by the model.

Figure 11. P-R Curves for cross-section detection on the test dataset, generated using the COCO analyze function. Panels show results for (a) slab girders, (b) T-girders, and (c) tapered T-girders. (a) P-R Curves for slab girder cross-section detection. (b) P-R Curves for T-girder cross-section detection. (c) P-R Curves for Tapered T-girder cross-section detection.

Figure 12. Visual comparison of segmentation performance under different input conditions. The top row shows the ground truth masks. The middle row shows predicted masks generated using manually annotated bounding boxes. The bottom row shows the predicted masks based on bounding boxes detected by the YOLOv8m network.

Figure 13. The swarmplot shows the error distribution per parameter, with the green line indicating the median, the box representing the IQR, and each circle corresponding to an individual cross-section. Parameter errors are in pixel values for all test images. The median error ranges from −7.9 pixels for P8 to 10.7 pixels for P1, with an average standard deviation of 56.15 pixels across all parameters.

Figure 14. The swarmplot shows the error distribution per parameter, with the green line indicating the median, the box representing the IQR, and each circle corresponding to an individual cross-section. Parameter errors are converted to real-world units using the drawing-specific scale. The median error ranges from −2.4 cm for P4 to 2.2 cm for P5 and the average standard deviation is 35.4 cm.

Figure 15. Visual comparison of segmentation performance using different SAM prompting strategies. The top row shows the ground truth masks. The middle row displays masks predicted using only a center keypoint as the prompt. The bottom row shows predictions generated using both the bounding box and center keypoint.

Figure 16. Classification of polygon edges based on orientation. Horizontal edges (red) and vertical edges (blue) are identified according to angular thresholds.

Figure 17. Parameter error in real units after excluding the five views with the least accurate results. The swarmplot shows the error distribution per parameter, with the green line indicating the median, the box representing the IQR, and each circle corresponding to an individual cross-section.

Figure 18. Representative examples of segmentation failures, shown together with the corresponding predicted bounding boxes.

Table 1. Geometric parameters per cross-section template. The check marks indicate which parameters are applicable to each cross-section template.

Parameter	Description	Slab Girder	T-Girder	Tapered T-Girder
P1	Offset X	✓	✓	✓
P2	Offset Y	✓	✓	✓
P3	Flange Height	✓	✓	✓
P4	Flange Taper Height	✓	✓	✓
P5	Web Height		✓	✓
P6	Flange Width	✓	✓	✓
P7	Web Width	✓	✓	✓
P8	Web Taper Width			✓

Table 2. Initial parameter estimates. Parameters P1 and P2 are derived from the top-left corner of the bounding box. Parameters P3 to P5 are expressed as percentages of the bounding box height, while P6 to P8 are based on the bounding box width.

Parameter	Slab Girder	T-Girder	Tapered T-Girder
P1	X-coordinate of top-left corner
P2	Y-coordinate of top-left corner
P3	50% Height	25% Height	25% Height
P4	50% Height	25% Height	25% Height
P5	-	50% Height	50% Height
P6	12.5% Width	25% Width	25% Width
P7	75% Width	50% Width	25% Width
P8	-	-	12.5% Width

Table 3. Summary of the dataset distribution for training, validation, and testing.

	Cross-Sectional Views				Cross-Sections
Split	Real	Synthetic	Negative	Total	Real	Synthetic	Total
Training	120	80	200	400	224	107	331
Validation	29	20	0	49	51	26	77
Testing	37	0	0	37	72	0	72
Total	186	100	200	486	347	133	480

Table 4. Summary of the dataset composition by cross-section type and data source.

Split	Type	Real	Synthetic	Total
Training	Slab girder	40	40	80
	T-girder	138	11	149
	Tapered T-girder	46	56	102
Validation	Slab girder	8	6	14
	T-girder	27	5	32
	Tapered T-girder	16	15	31
Testing	Slab girder	14	0	14
	T-girder	32	0	32
	Tapered T-girder	26	0	26

Table 5. Applied augmentation techniques for YOLOv8 training.

Augmentation	Description	Probability	Value Range
`hsv_h`	Hue adjustment (HSV)	1.0	±0.015
`hsv_s`	Saturation adjustment (HSV)	1.0	±0.7
`hsv_v`	Value adjustment (HSV)	1.0	±0.4
`translate`	Image translation	1.0	10% of image size
`scale`	Image scaling	0.5	±0.5
`fliplr`	Flipping image left-right	0.5	-
`mosaic`	Mosaic augmentation	1.0	-

Table 6. COCO evaluation metrics for cross-section detection.

Girder Type	${mAP}_{0.5}^{IoU}$	${mAP}_{0.75}^{IoU}$	${mAP}_{0.5 : 0.95}^{IoU}$	${mAR}_{0.5 : 0.95}^{IoU}$
Slab girder	0.827	0.764	0.686	0.686
T-girder	0.943	0.943	0.887	0.905
Tapered T-girder	0.687	0.687	0.614	0.614
Total	0.819	0.798	0.729	0.735

Table 7. COCO evaluation metrics for cross-section segmentation.

	${mAP}_{0.5}^{IoU}$	${mAP}_{0.75}^{IoU}$	${mAP}_{0.5 : 0.95}^{IoU}$	${mAR}_{0.5 : 0.95}^{IoU}$
Segmentation with Ground Truth	0.894	0.603	0.561	0.611
Segmentation with YOLOv8	0.743	0.418	0.358	0.459

Table 8. Parameter estimation errors for different pipeline configurations in pixel values. The table shows the median error for each estimated template parameter (P1–P8). Lower values indicate more accurate predictions. The best-performing configuration for each parameter is highlighted in bold.

Pipeline Variant	P1	P2	P3	P4	P5	P6	P7	P8
Default	5.88	2.45	0.00	−2.98	2.53	−3.14	2.76	−8.00
Polygon-based Init.	6.04	2.45	0.00	−2.95	3.76	−2.74	3.01	−7.95
IoU only	5.93	2.45	0.10	−2.09	2.48	−1.75	3.87	−8.00
DIoU only	5.88	2.45	0.00	−2.46	2.21	−3.20	2.88	−8.00

Table 9. Parameter estimation errors for different pipeline configurations in pixel values. The table compares the standard deviation across all parameter errors (P1–P8). Lower values indicate more stable estimates. The best-performing configuration for each parameter is highlighted in bold.

Pipeline Variant	P1	P2	P3	P4	P5	P6	P7	P8
Default	83.44	21.23	16.01	14.29	18.76	62.97	68.36	71.51
Polygon-based Init.	85.19	20.99	22.89	25.17	21.74	91.42	132.97	11.63
IoU only	80.46	20.98	21.94	19.06	22.03	59.79	70.06	70.33
DIoU only	81.75	21.16	16.02	14.06	18.43	59.19	68.45	70.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Faltin, B.; Alani, R.; König, M. Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning. Infrastructures 2026, 11, 48. https://doi.org/10.3390/infrastructures11020048

AMA Style

Faltin B, Alani R, König M. Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning. Infrastructures. 2026; 11(2):48. https://doi.org/10.3390/infrastructures11020048

Chicago/Turabian Style

Faltin, Benedikt, Rosa Alani, and Markus König. 2026. "Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning" Infrastructures 11, no. 2: 48. https://doi.org/10.3390/infrastructures11020048

APA Style

Faltin, B., Alani, R., & König, M. (2026). Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning. Infrastructures, 11(2), 48. https://doi.org/10.3390/infrastructures11020048

Article Menu

Extracting Geometric Parameters of Bridge Cross-Sections from Drawings Using Machine Learning

Abstract

1. Introduction

2. Background and Related Literature

2.1. Reconstruction from Drawings

2.2. Parametric Bridge Design

2.3. Research Gaps

3. Methodology

3.1. Cross-Section Detection

3.2. Cross-Section Segmentation

3.3. Polygon Extraction and Processing

3.4. Template-Based Parameter Extraction

3.5. Model Geometry Generation

4. Experiments

4.1. Dataset

4.2. Detection Results

4.3. Object Mask Construction

4.4. Parameter Extraction

4.5. Attempts to Enhance Performance

4.5.1. Keypoint-Based Prompting for SAM

4.5.2. Shape-Based Parameter Initialization

4.5.3. Impact of Loss Factors

5. Discussion

5.1. Reflections

5.2. Shortcomings

5.3. Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI