Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams

Fuchs, Lukas; Diesse, Marc; Weber, Matthias; Rasim, Arif; Feinauer, Julian; Schmidt, Volker

doi:10.3390/electronics14142889

Open AccessArticle

Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams

by

Lukas Fuchs

^1,*,

Marc Diesse

²,

Matthias Weber

¹

,

Arif Rasim

²,

Julian Feinauer

² and

Volker Schmidt

¹

Institute of Stochastics, Ulm University, 89069 Ulm, Germany

²

Pragmatic Minds GmbH, 73230 Kirchheim unter Teck, Germany

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2889; https://doi.org/10.3390/electronics14142889

Submission received: 14 May 2025 / Revised: 10 July 2025 / Accepted: 17 July 2025 / Published: 19 July 2025

Download

Browse Figures

Versions Notes

Abstract

The efficient and reliable maintenance and repair of industrial machinery depend critically on circuit diagrams, which serve as essential references for troubleshooting and must be updated when machinery is modified. However, many circuit diagrams are not available in structured, machine-readable format; instead, they often exist as unstructured PDF files, rendered images, or even photographs. Existing digitization methods often address isolated tasks, such as symbol detection, but fail to provide a comprehensive solution. This paper presents a novel pipeline for extracting the underlying graph structures of circuit diagrams, integrating image preprocessing, pattern matching, and graph extraction. A U-net model is employed for noise removal, followed by gray-box pattern matching for device classification, line detection by morphological operations, and a final graph extraction step to reconstruct circuit connectivity. A detailed error analysis highlights the strengths and limitations of each pipeline component. On a skewed test diagram from a scan with slight rotation, the proposed pipeline achieved a device detection accuracy of 88.46% with no false positives and a line detection accuracy of 94.7%.

Keywords:

circuit diagram; image processing; convolutional neural network; pattern recognition; distortion analysis; similarity measure

1. Introduction

The maintenance and repair of machinery are critical tasks in various industries, including manufacturing, transportation, and energy production. These machines are internally described by circuit diagrams, which serve as essential blueprints for understanding and troubleshooting electrical systems [1]. However, the preservation and interpretation of these circuit diagrams pose significant challenges due to their representation and age. There are attempts to introduce standards [2,3,4]. However, on the other hand, there is an evolution (or non-existence) of documentation standards. Often, these diagrams are available only as PDF files, with the underlying structure no longer accessible in a more usable format. Sometimes, these files do not even contain a proper vector graphic, but only scanned images. Although most modern circuit diagrams are being created by means of specialized ECAD (electronic computer-aided design) software, there are significant challenges in leveraging this data. First, many legacy circuit diagrams stored in archives no longer have corresponding ECAD files, which makes digitization difficult. Second, certain proprietary ECAD software, particularly older systems, may lack convenient interfaces for exporting data, thereby limiting users to printed outputs and complicating digital processing.

Traditional methods of interpreting circuit diagrams are labor-intensive and prone to human errors. Manual transcription and analysis are not only time-consuming but also susceptible to inaccuracies, especially in complex diagrams with extensive interconnections. However, this reconstruction process is crucial not only for troubleshooting, but also for adapting the machine. For instance, when a single device is exchanged, a new circuit diagram must be drawn to reflect the updated configuration. The ability to generate updated diagrams efficiently ensures that machinery can be modified and maintained with minimal disruption.

Several attempts have been made in the literature to address these tasks. In [5,6], convolutional neural network (CNN)-based proof-of-concepts for automatic detection of electrical devices in circuit diagrams are presented, demonstrating high accuracy for predefined symbol classes while identifying challenges with certain line styles. In [7], a Hough transformation is applied for line detection, showcasing its effectiveness but emphasizing the need for preprocessing to handle textual elements and noise. Furthermore, in [8], neural networks are explored for analyzing hand-drawn schematics, highlighting their robustness but also the trade-offs in computational efficiency and interpretability.

The motivation of the present paper is to combine the strengths of these individual approaches while overcoming their shortcomings, by developing a holistic pipeline that can accurately extract the underlying graphs from circuit diagrams, despite present text elements, different line styles, and noise. The proposed pipeline consists of image preprocessing, pattern matching, and graph extraction. Initially, a neural network (NN), specifically a U-net [9,10], is employed to remove unnecessary information such as noise, text, logos, and other extraneous elements from the circuit diagram. While this information could be valuable in AI-based detection approaches [11], it hinders traditional morphological and correlation coefficient-based methods. Following this, a gray-box pattern matching technique [12] classifies the devices within the diagram, and a line detection method identifies wires within the diagram. Finally, a graph extraction approach identifies and extracts a graph-based representation of the circuit diagram, suitable for computing, e.g., connected components [13]. See Figure 1 for an overview of the pipeline. Furthermore, the present paper provides a comprehensive error analysis, showing the benefits and drawbacks of the individual components of the proposed pipeline, which aims to facilitate the digital preservation, analysis, and visualization of these diagrams, thereby supporting maintenance activities and reducing downtime.

The rest of this paper is structured as follows: First, in Section 2, the type of data considered is presented. Then, in Section 3, the graph extraction pipeline is stated, starting with background removal, followed by device and line detection, and lastly graph extraction. Finally, in Section 4, the precision of the individual steps of the pipeline is analyzed.

2. Description of Data

For training and calibration of the proposed pipeline, openly available circuit diagram data was used. This training data was assembled from public repositories [14] containing printed circuit diagrams for a wide array of applications, ranging from flashable custom cartridges for handheld consoles to extension boards for single-board computers. All schematics were provided in the common KiCad schematic format [15], which is widely used in many open hardware projects. KiCad is an open source ECAD software and was employed to plot the schematic circuit diagrams to PDF. These PDFs are vector graphics and thus infinitely resolved; however, they are often not readily accessible for the tasks performed within our pipeline due to their unstructured nature and the lack of consistent structural conventions in PDF formats [16]. In particular, PDF files often contain invisible elements, and what visually appears as a single object may internally consist of multiple disjoint and non-hierarchically structured graphical primitives. Moreover, parts of the circuit diagram—or even the entire image—may be embedded as a raster graphic rather than vector data, further complicating systematic extraction and analysis. As a remedy, PDF files are converted into images to which the following methods that are described are applied, instead of processing the PDFs directly. The images contain precisely the information observable for the human eye in a rendered PDF, i.e., the necessary details for accurate graph extraction. In particular, this method does not require actual vector data within the PDFs but is also feasible for highly unstructured or pre-rasterized data.

Besides addressing the previously mentioned problems, considering rasterized images additionally enables the use of well-studied convolutional neural networks and pattern matching approaches for the circuit diagram extraction pipeline. Moreover, a pipeline for rasterized images can easily be applied to data from other sources such as images generated by ECAD software or even scans of printed circuit diagrams. As the latter may pose additional challenges due to artifacts from scanning, the proposed pipeline will be evaluated on rasterized data obtained from PDFs as well as scanned images in Section 4.

To calibrate the presented pipeline in a supervised setting and to quantify the quality of its results, we compute two binary images

x, y \in {0, 1}^{m \times n}

of height

m \in N = {1, 2, \dots}

and width

n \in N

from each circuit diagram in its vector graphic representation. The image x represents simulated but realistic input data of the pipeline. x includes not only wires and devices but also additional elements such as images, text, and gaps within lines of wires. In contrast, the image y contains only the essential information of the circuit diagram, namely the wires represented by connected lines and devices. The image y allows for the straightforward extraction of information. To generate these pairs of image data

(x, y)

, we utilize a priori knowledge of the structure of the PDF files generated by KiCad (v8.0.6) [15]. This is to say, the PDF files do not contain dashed lines, and the text is properly decoded as text, not as individual line segments. This allows us to efficiently compute two versions of the circuit diagram, the image x containing the original text, with newly introduced dashed lines and noise, and y containing neither of them. More precisely, to generate a pair

x, y \in {0, 1}^{m \times n}

of a raw input image and its corresponding preprocessed image, a given vector-based circuit diagram is processed as follows:

First, to generate x, connected lines are randomly replaced by dashed lines. This modification is applied independently to each line segment longer than 0.5 cm, with a probability of 0.5. The minimum length threshold ensures that small lines within devices, which are essential for preserving their meaning, remain unaltered. The dashed line style

(a, b)

, where a denotes the length of the line segments and b the length of the gap, is chosen uniformly at random from the range

[0.015, 0.15]

cm. This ensures diverse and realistic variations in line styles. The modified vector graphic is rendered to an image with a resolution uniformly drawn from

[150, 500]

dpi, to account for the wide variability in scale and line width in circuit diagram images. The rendering is performed as a binary (black and white) image. Formally, the resulting image I from a rendering is an element of

{0, 1}^{m \times n}

, where m and n denote the dimensions of the image. A value of 0 corresponds to the background, whereas a value of 1 corresponds to the foreground, i.e., a line, a device, text, etc. To simulate scanning artifacts, noise is added by randomly inverting each pixel’s value with a probability uniformly drawn from

[0, 0.02]

. This random process generates the network input x. To generate the corresponding clean image y, the text from the original vector graphic is removed before rendering the graphic at the same resolution. For a visualization of the resulting images x, see Figure 2. Note that this procedure is random; thus, it is applied to each PDF page several times in order to generate a larger database. More precisely, first, 140 pages of PDF files of circuit diagrams derived from [15] are split into training and test data. Thereby, 90% of these data were independently and uniformly selected at random as training data in order to calibrate the pipeline. The remaining 10% of the data were set aside and only considered during validation, see Section 4. Then, from each of these PDF pages, 20 image pairs

(x, y)

are generated as stated above. For two examples of these images, see Figure 2. Although 140 pages may be relatively few, the large scale of the images ensures that this still provides sufficient data for both training and validation.

3. Methods

This section describes the pipeline of graph extraction from circuit diagrams step by step. First, the image preprocessing procedure is outlined. The goal of this step is to clean the images by removing noise and irrelevant text, which facilitates faster and more robust subsequent processing. Additionally, dotted and dashed lines are replaced by connected lines to simplify line detection. Next, the pipeline identifies the devices in the circuit diagram, followed by line detection to identify connections between them. Finally, the extracted information is combined to generate a graph representing the connected devices.

3.1. Text Removal and Dashed Line Connection

The aim of this section is to introduce a neural network-based procedure to preprocess the image data, enhancing the effectiveness and robustness of subsequent pattern matching and graph extraction tasks. The goal is to produce “clean” images y from input images x that do not contain any elements not important for graph extraction, such as text, noise (appearing in scanned PDFs), or dashed lines. Recall that for an image

x \in {0, 1}^{n \times m}

, a pixel of value 1 is considered as foreground, and a pixel of value 0 is considered to belong to the background. The neural network’s task is to assign a value of 1 to pixels belonging to the actual circuit diagram and a value of 0 to all others. This encodes the removal of unnecessary elements and allows for the conversion of dashed lines into continuous lines to improve line detection. This is particularly important because dashed lines can appear in various forms, making them challenging to handle with conventional methods; see Figure 3 for examples.

The neural network

U : {0, 1}^{m \times n} \to {[0, 1]}^{m \times n}

utilized in this paper is a U-net, which is similar to the network proposed in [9]. It was originally designed for classifying pixels in medical images, but it is now widely used for image segmentation in many different fields [17,18]. In this context, the task of the present paper can be viewed as a classification problem, where each pixel in x is labeled as foreground or background. In particular, the U-net features a symmetric encoder–decoder structure that captures contextual information through downsampling and recovers spatial details via upsampling. Additionally, skip connections between the encoder and decoder layers allow the network to integrate low-level features with high-level context, improving the precision of pixel-wise outputs while mitigating issues like vanishing gradients. Thus, the neural network U is a high-parametric function, the parameters of which have to be fitted in order to perform the desired task, where the network training is conducted on the pairs

(x, y)

of raw and cleaned images introduced in Section 2.

As a loss function for training, the weighted mean squared loss (WMSE) is utilized, i.e., a function

WMSE : {0, 1}^{3 \times m \times n} \to R

is used. More precisely, for any pair of training data

x, y \in {0, 1}^{m \times n}

, and corresponding network output

U (x) = \tilde{y} \in {[0, 1]}^{m \times n}

, the loss

WMSE (x, y, \tilde{y})

is given by

\begin{matrix} WMSE (x, y, \tilde{y}) = \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - y_{i j})}^{2} \cdot (0.01 + x_{i j} + y_{i j}), \end{matrix}

(1)

where

x_{i j}, y_{i j}, {\tilde{y}}_{i j}

with

1 \leq i \leq m, 1 \leq j \leq n

refer to the corresponding pixel values of the images

x, y, \tilde{y}

, respectively. Note that WMSE is used instead of the classical mean squared error in order to address the heavily imbalanced class problem, i.e., a large fraction of background pixels, which are almost meaningless for the circuit diagram, and a very low fraction of foreground pixels, which are of high importance. In order to train the network to reliably perform on a wide range of data, and even properly perform on slightly rotated inputs such as those arising from scans, the images

x, y

are rotated by the same angle

ϕ = ϕ_{1} + ϕ_{2}

, where

ϕ_{1}

and

ϕ_{2}

are independently and uniformly drawn from the sets

{0, 90, 180, 270}

and

[- 10, 10]

, respectively. To preserve binary values, the so-called nearest-neighbor interpolation is applied during rotation [19].

Training of network parameters is accomplished by minimizing the values of

WMSE (x, y, \tilde{y})

given in Equation (1), using an Adam optimizer [20] with a learning rate of 0.01, where the network architecture is similar to the one described in [9], but with half the number of channels per layer. This reduction in model complexity was chosen to lower computational and memory demands, enabling efficient training and inference on commercially available GPUs while still achieving comparable performance. In particular, preliminary experiments showed that approximately the same loss was reached on the training data with the reduced network, making it sufficient for the relatively simple task at hand. Note that the prediction of the network U can take values between 0 and 1. Thus, to achieve a binary clean image during inference, the network output

U (y) \in [0, 1]

is pixel-wise rounded to the nearest integer in

{0, 1}

.

3.2. Object Detection

Circuit diagrams prominently feature devices alongside wire connections, see Figure 2. These device elements range from basic devices, such as resistors, capacitors, and transistors, to more complex digital and analog integrated circuits (ICs), including, for example, microcontrollers, sensors, and operational amplifiers. To derive a graph representation from a circuit diagram, the detection of those devices is a crucial step.

One common approach to object detection in computer vision is template matching [21,22], where a template, that is a small image or a feature representation of the object to be detected, is systematically compared to cutouts taken from all over a given image. The goal is to find pixels in the target image that closely match the template. In the present scenario, which involves simple black-and-white images, a traditional pattern-matching approach is chosen to maintain a fast white-box Pearson correlation coefficient-based model with explainable errors. If, however, more complex pattern recognition methods become necessary, particularly when dealing with data that exhibits a broader variety of device symbol appearances, there are more advanced approaches available in the literature. For example, YOLO (You Only Look Once) [23] applies a convolutional neural network to directly predict bounding boxes and class probabilities across an entire image in a single forward pass, enabling fast and robust object detection even under varying conditions. Alternatively, Mask R-CNN [11] provides a two-stage detection and segmentation pipeline that not only localizes objects with bounding boxes but also generates pixel-wise masks for each detected instance, making it especially suitable for applications where precise shape information is critical. These and other neural network-based methods [24], while more computationally demanding and requiring annotated training data, offer higher flexibility and generalization for symbol detection across diverse diagram styles.

Fortunately, there are industry standards for symbols used in circuit diagrams for elementary devices, like the American IEEE Standard 315 [25] and IEC 60617 [4], widely used in Europe. Hence, we can draw from a rich library of device templates to detect. Figure 4 depicts some examples.

For template matching, a standard method is used which is based on the Pearson correlation coefficient

(PCC)

[21,26]. The similarity

PCC (c, z) \in [- 1, 1]

of a cutout

c \in {0, 1}^{k \times l}

and a template

z \in {0, 1}^{k \times l}

is given by

\begin{matrix} PCC (c, z) = \frac{\sum_{i = 1}^{k} \sum_{j = 1}^{l} (c_{i j} - \bar{c}) (z_{i j} - \bar{z})}{{(\sum_{i = 1}^{k} \sum_{j = 1}^{l} {(c_{i j} - \bar{c})}^{2})}^{\frac{1}{2}} {(\sum_{i = 1}^{k} \sum_{j = 1}^{l} {(z_{i j} - \bar{z})}^{2})}^{\frac{1}{2}}}, \end{matrix}

(2)

where

\bar{c} = \frac{1}{k l} \sum_{i = 1}^{k} \sum_{j = 1}^{l} c_{i j}

,

\bar{z} = \frac{1}{k l} \sum_{i = 1}^{k} \sum_{j = 1}^{l} z_{i j}

, and

c_{i j}

,

z_{i j} \in {0, 1}

for

i \in {1, \dots, k}

,

j \in {1, \dots, l}

denote the pixel values of

c, z \in {0, 1}^{k \times l}

, respectively.

To detect occurrences of a given pattern

z \in {0, 1}^{k \times l}

within a binary image

I \in {0, 1}^{m \times n}

, where

1 \leq k < m

and

1 \leq l < n

, the values of

PCC (c, z)

are computed using a sliding window approach. More specifically, the sliding window represents all possible cutouts c of size

(k, l)

from I, corresponding to the size of the pattern. If k and l exceed the actual size of the pattern, the window captures additional pixels, such as parts of surrounding wires that are not present in the template, which negatively impacts the similarity measure. Conversely, if k and l are smaller than the minimal bounding box of the pattern, important features may be omitted, increasing the risk of false positives. Therefore, the values of k and l are chosen to exactly match the bounding box of each pattern, ensuring that all relevant structure is captured for accurate matching. At each valid location

(p, q)

in the image, where

1 \leq p \leq m - k

and

1 \leq q \leq n - l,

the corresponding cutout is defined as

c = I_{p : p + k, q : q + l} \in {0, 1}^{k \times l},

which consists of the pixels from row indices p to

p + k - 1

and column indices q to

q + l - 1

. The value of

PCC (c, z)

for the cutout c and the pattern z is then used to quantify their similarity. By iterating over all valid positions

(p, q) \in {1, \dots, m - k} \times {1, \dots, n - l}

of the cutout

c = I_{p : p + k, q : q + l}

, a correlation map is obtained, highlighting regions in I that closely match z. Note that the computation of the PCC across all valid positions can be efficiently implemented using convolution operations.

Recall that the values of

PCC (c, z)

belong to the interval

[- 1, 1]

, where

- 1

and 1 correspond to a positive and negative linear relation between the cutout c and the pattern z, respectively. Translated to our application on binary images and templates, this means that

PCC (c, z)

yields a value of 1 if

z = c

and a value of

- 1

if

z = 1 - c

. Thus, our criterion for detection is a value of

PCC (c, z)

higher than some threshold

t > 0

. To avoid false positives, i.e., falsely detected templates, the threshold of

t = 0.8

was empirically chosen.

Given that multiple instances of the template may appear in the image and that a single template can yield several matches, we apply non-maximum suppression with a low intersection over union (IoU) threshold [27], where the value of

IoU ((p, q), (p^{'}, q^{'}))

for two cutouts

(p, q)

and

(p^{'}, q^{'})

of size

k \cdot l

is defined as the ratio of the number of pixels present in both cutouts to the number of pixels present in at least one of them, i.e.,

\begin{matrix} IoU ((p, q), (p^{'}, q^{'})) = \frac{max {0, k - | p - p^{'} |} max {0, l - | q - q^{'} |}}{2 k l - max {0, k - | p - p^{'} |} max {0, l - | q - q^{'} |}} . \end{matrix}

(3)

Since devices will not overlap in a diagram, we chose a small threshold of

0.2

, i.e., for two cutout positions

(p, q)

and

(p^{'}, q^{'})

of an image I and a template z with

PCC (I_{p : p + k, q : q + l}, z) > PCC (I_{p^{'} : p^{'} + k, q^{'} : q^{'} + l}, z) > 0.8

and

IoU ((p, q), (p^{'}, q^{'})) < 0.2

, the matched template z at position

(p^{'}, q^{'})

is neglected. If the values of

PCC (I_{p : p + k, q : q + l}, z)

and

PCC (I_{p^{'} : p^{'} + k, q^{'} : q^{'} + l}, z)

given in Equation (2) are equal, we keep one of both cutout positions. The procedure described above filters the matches, ensuring that we obtain a refined list of detected devices that do not heavily overlap. As a brief note on implementation, most of the image processing operations were performed using the open-source library OpenCV (v4.11.0) [28].

Note that the classic pattern matching procedure described above has shortcomings when applied to images that show a lot of noise or arbitrary rotations and resolution, compared to more robust approaches ranging from allowing a cleverly chosen set of template deformations [29] to feature matching methods with [30] or without [21,31] the assistance of deep learning models. However, since we performed a preprocessing step to remove noise, and deformations in devices and variations in gray scale values do not appear in the considered cases, the simple and fast approach is chosen as described above. Moreover, this procedure has the advantage of providing clear and interpretable errors, as mismatches can be directly traced back to template discrepancies.

The templates utilized to demonstrate the pipeline were sourced from the KiCad symbolic libraries. Specifically, a subset of the most critical devices from the library Device.kicad_sym [32] was selected. In the detection pipeline, the templates were up- or downscaled with linear interpolation and scale factors

s \in {0.7, 0.8, 0.9, 1.0, 1.2}

. Furthermore, the templates were rotated to four distinct orientations, that is,

0^{\circ}

,

90^{\circ}

,

180^{\circ}

, and

270^{\circ}

. To enhance scale robustness additionally, a morphological dilation operation was applied to both the image and templates using a

(3, 3)

rectangular structuring element [33]. This process increased the thickness of all edges, improving the correlation of shapes on different scales.

Clearly, if the scope of the pipeline is to process circuit diagrams designed by a different ECAD software, other appropriate templates should be chosen to not lose a significant amount of detection recall, since there could be differences in the device symbol designs in spite of the industry standards. Another notable consideration is the computational cost associated with template matching through correlation and sliding window methods, particularly with high-resolution input images. To address this, it is advisable to limit the number of templates used in the matching process, since the computational cost linearly increases with the number of patterns. Thus, in practical applications, the list of templates used in pattern recognition should be carefully selected based on the specific field of application and relevant industry standards. However, the primary cost here is time, not memory. In contrast to deep learning models, template matching typically has modest memory requirements and can be executed on standard hardware. If real-time performance is not a priority, expanding the template set remains a viable strategy, particularly given the low risk of false positives when using a similarity threshold of

t = 0.8

. Furthermore, most pixels in circuit diagrams are background, so preprocessing to identify regions of interest can drastically reduce pattern matching positions and improve speed.

For the quantitative analysis of the pipeline performed in Section 4, we selected templates for matching standard bipolar transistors, capacitors, ferrite coils, resistors, ground markings, and diodes, which results in a set of 14 original templates and a set of 280 modified, i.e., scaled and rotated, templates.

We intentionally focused on these simpler components as a proof of concept to demonstrate the feasibility of the approach. While this restricts the current applicability to less complex circuit diagrams, the underlying approach can be easily extended by including a broader range of components and leveraging the results of the line detection stage to identify integrated circuits, which are most commonly represented as rectangular boxes.

While the primary device detection is based on a fast and interpretable template matching approach, we additionally implemented and evaluated a second detection module using the Detectron2 (v0.6) library [34]. Originally developed for complex detection and segmentation tasks in natural images, Detectron2 offers powerful models capable of handling symbol variations, occlusions, and noise levels that may exceed the robustness of classical Pearson correlation coefficient-based pattern matching. For this purpose, a Faster R-CNN model with a ResNet-50 backbone and Feature Pyramid Network (FPN) [35,36] is employed, as provided by the Detectron2 framework. The model was trained on synthetically generated data by randomly placing device templates, identical to those used in the correlation-based approach, into images containing straight lines, thereby simulating the typical visual structure of circuit diagrams. To mitigate the problem of detecting small objects in a large image (in addition to choosing the ResNet-50 backbone with FPN) we use the common strategy of patch-based inference. This means we crop patches of the original image with an overlap big enough that all templates fit inside. Then, after detection in all patches and transformation of the bounding box coordinates, we apply non-maximum suppression to remove duplicates in the overlap.

3.3. Line Detection and Clustering

After extracting the electrical devices from the circuit diagram using the pattern matching method described in Section 3.2, the next step is to identify the wires, which are represented by chains of horizontally or vertically aligned consecutive foreground pixels, called lines in the following descriptions. More precisely, for any integer

ℓ > 1

, a (horizontal or vertical) line

A = {a_{1}, \dots, a_{ℓ}} \subset Z^{2} = {\dots, - 1, 0, 1, \dots}^{2}

is a set of ℓ pixel positions such that there exists an orientation

o \in \{(\binom{1}{0}), (\binom{0}{1})\}

satisfying

a_{j} - a_{j - 1} = o

for all

j \in {2, \dots, ℓ}

. To detect these sets of pixel positions, a line detection algorithm, based on binary morphological operations [33], is applied.

Line detection algorithm. For detecting horizontal lines, we use a horizontal opening operation [33], i.e., any foreground pixel which is not included in the union of all sets of k horizontally consecutive foreground pixels is replaced with a background pixel. For vertical lines, a similar vertical opening is performed, where the number k must be chosen as the pixel length of the shortest line to detect. We set $k = ⌊\frac{m}{160}⌋$ for horizontal lines and $k = ⌊\frac{n}{160}⌋$ for vertical lines, where $m, n$ denotes the width and height of the input image, respectively, and $⌊ r ⌋ = max {ℓ \in N : ℓ \leq r}$ is the largest integer smaller than or equal to $r > 0$ . While this method is not suitable for detecting curved or significantly inclined lines, thus possibly requiring additional preprocessing, it prevents the detection of false positives and is efficiently computable, using convolution. As will be shown, this approach performs reasonably well on synthetic and measured validation data.

Since wires are represented by straight lines in images, identifying the start and end coordinates of them is now straightforward. For cases involving lines that are far from being perfectly straight, a more robust approach would be required. Typically, this involves applying the Hough transform [37] or skeletonization [38,39] to identify line candidates.

Note that the procedure described above generates duplicated lines if lines within the original image are thicker than one pixel. Furthermore, as a consequence of the preprocessing steps stated in Section 3.1, there can be foreground noise in the neighborhood of the lines, also resulting in superfluous line detections. To counteract this, lines that are encompassed in other lines are removed as a postprocessing step. Here, the term “encompassed” means the following: a line

A \subset Z^{2}

is encompassed in a line

B \subset Z^{2}

if A and B share the same orientation, the size of A is less than or equal to the size of B, and the distance of A and B, i.e., the smallest distance between the pixel positions in A and B, is smaller or equal than a threshold of 5.

A further postprocessing step is the removal of lines with a length shorter than or equal to a threshold of 5 pixels.

Classification of line crossings. To derive the connectivity information of wires, it remains to cluster the family of detected lines into connected components. For this, it is important to note that there is a variety of junction styles, with different meanings, see Figure 2 and Figure 5. Thus, in the following, we employ another (small) CNN to determine the type of line crossings.

The utilized CNN is denoted by

S : {0, 1}^{50 \times 50} \to {[0, 1]}^{10}

and processes 50 × 50 pixel cutouts

c \in {0, 1}^{50 \times 50}

centered at intersections of lines. More precisely, if the line detection algorithm detects a junction of two lines at pixel

(p, q) \in Z^{2}

in the image I, a cutout

c = I_{p - 25 : p + 25, q - 25 : q + 25}

is computed. For pixel positions

(p, q)

which would result in cutouts that exceed the original image I, we apply zero padding [40]. The network S is trained to classify a cutout c to belong to one of 10 different classes, representing the five junctions shown in Figure 5, the four corner junctions arising from the connected version of Figure 3b,c (each with rotations by

0^{\circ}

,

90^{\circ}

,

180^{\circ}

, and

270^{\circ}

), and a class representing the case that no junction is present in c.

The architecture of the network

S : {0, 1}^{50 \times 50} \to {[0, 1]}^{10}

can be given as the superposition of simple (almost everywhere) differentiable functions, called layers, i.e.,

\begin{matrix} S = Softmax \circ FC \circ Dropout \circ Flatten \circ Pool 2 \circ ReLU \circ Conv 2 \circ Pool 1 \circ ReLU \circ Conv 1, \end{matrix}

(4)

where the functions appearing on the right-hand side of Equation (4) will be explained below in detail. Observe that the network S processes an input image

c \in {0, 1}^{50 \times 50}

and successively transforms it to a vector of length 10, representing the probability that c belongs to the individual classes. For simplicity of the notation of the individual layers, we implicitly assume that c is a matrix of shape

(50, 50, 1)

instead of

(50, 50)

.

Layers of the classification network. The convolutional layer $Conv 1 : R^{50 \times 50 \times 1} \to R^{48 \times 48 \times 32}$ in Equation (4) consists of 32 filters of size $3 \times 3 \times 1$ , whereas the convolutional layer $Conv 2 : R^{24 \times 24 \times 32} \to R^{22 \times 22 \times 64}$ consists of 64 filters of size $3 \times 3 \times 32$ . Both convolutional layers utilize a stride of 1 and no padding. Formally, for any integers $h, w > 2$ and $d, t \geq 1$ , a convolution layer $Conv : R^{h \times w \times d} \to R^{h - 2 \times w - 2 \times t}$ is given by

$\begin{matrix} Conv {(X)}_{j, k, l} = \sum_{α = 0}^{2} \sum_{β = 0}^{2} \sum_{γ = 1}^{d} W_{α + 1, β + 1, γ, l} X_{j + α, k + β, γ}, \end{matrix}$

(5)

where $X \in R^{h \times w \times d}$ is some input matrix, $Conv {(X)}_{j, k, l}$ denotes the value of $Conv (X)$ at entry $(j, k, l) \in {1, \dots, h - 2} \times {1, \dots, w - 2} \times {1, \dots, t}$ , and $W \in R^{3 \times 3 \times d \times t}$ is a matrix of the trainable parameters. Then, an activation layer follows after each convolutional layer in Equation (4), which is represented by the activation function ReLU: $R \to [0, \infty)$ with $ReLU (r) = max {0, r}$ for each $r \in R$ . The non-linearity of this function allows the network to represent non-linear, complex correlations present in the training data. Furthermore, the max-pooling layers $Pool 1 : R^{48 \times 48 \times 32} \to R^{24 \times 24 \times 32}$ and $Pool 2 : R^{22 \times 22 \times 64} \to R^{11 \times 11 \times 64}$ in Equation (4) reduce the spatial dimensions of their inputs. Both functions perform downsampling using a $2 \times 2$ region with stride 2, selecting the maximum value within each pooling region. Formally, for any integers $h, w \in {2, 4, \dots}$ and $d \geq 1$ , a pooling layer $Pool : R^{h \times w \times d} \to R^{h / 2 \times w / 2 \times d}$ is given by

$\begin{matrix} Pool {(X)}_{j, k, l} = max \{X_{2 j - 1, 2 k - 1, l}, X_{2 j - 1, 2 k, l}, X_{2 j, 2 k - 1, l}, X_{2 j, 2 k, l}\}, \end{matrix}$

(6)

where $X \in R^{h \times w \times d}$ , and $(j, k, l) \in {1, \dots, h / 2} \times {1, \dots, w / 2} \times {1, \dots, d}$ .

The flatten layer

Flatten : R^{11 \times 11 \times 64} \to R^{11 \cdot 11 \cdot 64}

in Equation (4) reshapes the input matrix into a vector of size

11 \cdot 11 \cdot 64 = 7744

, and the dropout layer Dropout:

R^{7744} \to R^{7744}

introduces regularization to prevent the network S from overfitting, which is given by

Dropout {(X)}_{j} = w_{j} \cdot X_{j}

for any input vector

X \in R^{7744}

and

j \in {1, \dots 7744}

, where

w_{1}, \dots, w_{7744} \in {0, 1}

are independently sampled realizations of a Bernoulli distributed random variable with parameter 0.5.

The fully connected layer

FC : R^{7744} \to R^{10}

in Equation (4) is a linear transformation given by

FC (X) = W X + b

for any input vector

X \in R^{7744}

, where

W \in R^{10 \times 7744}

is a trainable weight matrix, and

b \in R^{10}

is a trainable bias vector. Finally, for any input vector

X = (X_{1}, \dots, X_{10}) \in R^{10}

and

j \in {1, \dots, 10}

, the softmax activation layer

Softmax : R^{10} \to {[0, 1]}^{10}

in Equation (4) is defined as

\begin{matrix} Softmax {(X)}_{j} = \frac{e^{X_{j}}}{\sum_{k = 1}^{10} e^{X_{k}}} . \end{matrix}

(7)

This ensures that the outputs are within the set

{[0, 1]}^{10}

. Further details on the layer architecture of CNNs can be found, e.g., in [40].

Training of network parameters. To train the parameters of the classification network S, i.e., the parameters of the convolutional layers and the linear layer, we generate synthetic training data consisting of pairs $(c, g)$ of image data and junction label. Thereby $c \in {0, 1}^{50 \times 50}$ is a (binary) image of a junction, and $g = (g_{1}, \dots, g_{10}) \in {0, 1}^{10}$ represents the corresponding intersection class as one hot encoding, i.e., $g_{i} = 1$ if and only if the intersection class of c is the i-th class. In total, we generate 500 samples for each of the 9 connection classes by applying random augmentations to the line crossings. More precisely, we apply random translations to the line coordinates by up to 3 pixels, and we modify the size of dot markings by uniformly drawing their radius from the interval $[2, 4]$ . Furthermore, for all intersection classes, we add noise close to the foreground by switching any background pixel, having a pixel distance of less than one to foreground, into a foreground pixel with a probability of 0.03. Finally, we generate 1000 samples for the undefined class by randomly placing rectangles, triangles, and off-center lines to capture ambiguous cases. The network is then trained for multi-class classification using the categorical cross-entropy loss function $L : {0, 1}^{10} \times {(0, 1)}^{10} \to R$ given by

$\begin{matrix} L (g, S (c)) = - \sum_{i = 1}^{10} g_{i} log (S {(c)}_{i}), \end{matrix}$

(8)

for any pair $(c, g) \in {0, 1}^{50 \times 50} \times {0, 1}^{10}$ of training data, where $log : (0, \infty) \to R$ is the natural logarithm.
Clustering of detected lines. First, cutouts containing pairs of possibly intersecting lines are determined, which will be classified by the network S given in Equation (4). Therefore, for each pair $(A, B)$ of detected lines, where A is a horizontal line and B is a vertical line, the pair $(a, b)$ of the closest pixel positions $a \in A$ and $b \in B$ is computed. If the distance between a and b is larger than 25 pixels, the lines A and B are considered as non-intersecting. Otherwise, a cutout c centered at $μ = \frac{a + b}{2}$ is computed, i.e., c is given by $c = I_{⌊ μ_{1} ⌋ - 16 : ⌊ μ_{1} ⌋ + 16, ⌊ μ_{2} ⌋ - 16 : ⌊ μ_{2} ⌋ + 16}$ , where $μ_{1}, μ_{2}$ are the coordinates of $μ = (μ_{1}, μ_{2}) \in R^{2}$ . The network S is then used to determine whether the wires depicted in c are connected or not. For the remainder of this paper, by slight abuse of terminology, the term wire will denote a connected component of lines.

3.4. Connectivity Graph

In order to suitably represent the set of devices and wires, as well as connections between them, we will use some basic notions from graph theory [13]. That is, we consider a graph

G = (V, E)

with a (non-empty) set of nodes V containing all detected devices and wires of a circuit diagram. Furthermore, the set

E \subset V \times V

of edges of the graph contains pairs of nodes, representing connections between devices and wires. More precisely,

(u, v) \in E

if and only if a wire

u \in V

and a device

v \in V

are connected with each other, which means that a connection point of the wire, i.e., any endpoint of some line in the component has a distance of less than 5 pixels to a foreground or background pixel of the device. Finally, we delete all detected wires that are not contained in any edge of the graph G, which can be due to false positives in the line detection step.

On the one hand, this representation is low-dimensional, and, on the other hand, it allows for the direct application of algorithms from the well-established field of graph theory. As a result, there exist several efficient algorithms for computing properties of graphs that are important in practical applications. For example, for troubleshooting electrical devices, these algorithms can identify all devices that are interconnected or determine the shortest paths connecting them.

Furthermore, for validation purposes, we will utilize graph similarity measures to check the quality of the pipeline proposed in the present paper, since these measures are capable of displaying global graph features, such as the connectivity of devices. In particular, in Section 4 below, we will compare graphs generated by hand labeling with those generated by the pipeline presented.

Let

G = (V, E)

and

G^{'} = (V^{'}, E^{'})

be two graphs. Then, the so-called graph edit distance (

GED

) is given by

\begin{matrix} GED (G, G^{'}) = \frac{| V \cap V^{'} | + | E \cap E^{'} |}{| V \cup V^{'} | + | E \cup E^{'} |}, \end{matrix}

(9)

where the symbols

\cap, \cup

are used for the intersection and union of sets, respectively, and

| \cdot |

denotes cardinality. Intuitively, in our case, the quantity

GED (G, G^{'})

given in Equation (9) describes the fraction of correctly detected wires and devices, as well as connections between these objects. To further investigate where some of the low precision of the graph detected by our pipeline might come from, we consider the precision

N_{p} (G, G^{'})

and the recall

N_{r} (G, G^{'})

of nodes, which are given by

\begin{matrix} N_{p} (G, G^{'}) = \frac{| V \cap V^{'} |}{| V^{'} |}, a n d N_{r} (G, G^{'}) = \frac{| V \cap V^{'} |}{| V |}, \end{matrix}

(10)

assuming that

| V |, | V^{'} | > 0

. Moreover, assuming that

| E |, | E^{'} | > 0

, we consider the precision

E_{p} (G, G^{'})

and the recall

E_{r} (G, G^{'})

of edges, which are given by

\begin{matrix} E_{p} (G, G^{'}) = \frac{| E \cap E^{'} |}{| E^{'} |} a n d E_{r} (G, G^{'}) = \frac{| E \cap E^{'} |}{| E |} . \end{matrix}

(11)

Finally, as a measure of global similarity of two graphs G and

G^{'}

, the numbers of their connected components are compared to each other. A connected component

C \subset V

of a graph

G = (V, E)

is a subset of nodes such that for any

u, v \in C

, there exists a path of connected nodes

u_{0}, u_{1}, \dots, u_{ℓ} \in C

from u to v, i.e.,

u = u_{0}

,

u_{ℓ} = v

and

(u_{i}, u_{i + 1}) \in E

for each

i \in {0, 1, \dots, ℓ - 1}

, and there is no path between any

u \in C

and any

v \in V ∖ C

. The number of connected components of a graph G will be denoted by

c (G)

. To compare two graphs G and

G^{'}

, we consider the relative number

NC (G, G^{'})

of connected components of G and

G^{'}

, defined as

\begin{matrix} NC (G, G^{'}) = \frac{c (G^{'})}{c (G)} . \end{matrix}

(12)

4. Results

Before quantifying the overall graph extraction quality, the quality of the individual steps considered in Section 3 is analyzed separately.

4.1. Quality of the U-Net Output

Recall that the pipeline starts with a U-net-based preprocessing of circuit diagrams in order to remove unnecessary text and noise and to connect dashed line segments. A visual impression of the results achieved can be obtained from Figure 6.

To assess the quality of the network output more quantitatively, we analyze the various types of errors individually. This analysis is conducted on test data not used for network training. On this data, the U-net stated in Section 3.1 successfully removes 99.8% of black pixels attributed to noise and 99.8% of black pixels associated with text. Additionally, 96.5% of white pixels that belong to the foreground (circuit diagram) such as those present in dashed lines are accurately replaced by black pixels. This is important for line detection. Moreover, 100.0% of background pixels remain correctly unmodified. A visualization of these results is given in Figure 7, where it becomes clear that the task of detecting dashed lines is the hardest one for the U-net.

However, even though the U-net achieves a high accuracy over all pixels, in circuit diagrams not all pixels are equally important. For example, there are cases where the value of a single pixel determines whether two devices are connected or not. The quality measure considered in Figure 7 cannot reflect this. Thus, as a second quality measure, the resulting correct detection of lines and devices is analyzed.

4.2. Synthetic Distortion of Input Images

To assess the precision of the proposed pipeline and further analyze the quality of the pipeline with respect to different qualities of the image data, its performance was analyzed under synthetic distortions of a subset of the validation data, see Figure 6, where the distortions included three types: (1) noise of varying magnitudes added to the input image, (2) scaling applied to the input image, and (3) increased gap sizes within dashed lines.

Varying magnitude of noise. For the analysis of the robustness of the noise, the test data were modified by changing each background pixel to a foreground pixel with a certain probability $p \in [0, 0.2]$ , while keeping all other parameters (scaling and gap size) constant. Figure 8a presents the resulting pipeline precision. Specifically, the metrics include the mean fraction of correctly classified foreground pixels in the ground truth image (orange), the mean fraction of correctly detected lines in the line detection step (blue), and the mean fraction of devices accurately identified in the pattern matching step (green; red). To evaluate the number of correctly identified lines and devices, these elements were manually labeled. A detected line was considered correct if its start and endpoint matched those of a labeled line within a tolerance of 10 pixels with respect to the maximum metric. Likewise, a device was classified as correctly detected if the intersection over union (see Equation (3)) between a detected template and the corresponding (pre-labeled) bounding box exceeded the value of 0.8, see Figure 6c,d.

Although the pixel-wise classification performance does not show a significant decrease under increasing noise levels, the quality of line detection and device detection decreases significantly at extreme (but unrealistic) noise levels. This degradation occurs primarily because the noise removal network is trained using a loss function that considers only local (pixel-wise) losses. It does not account for global properties such as line connectivity, which are critical for accurate line and device detection. The fraction of detected components decreases with increasing noise for both the R-CNN-based method and the correlation coefficient-based method. However, the R-CNN-based method detects approximately 5% more devices.

Effect of scaling. Similarly, to evaluate the effect of scaling, the same validation image was rendered at different resolutions ranging from 75 to 500 dpi, while keeping other distortion magnitudes unchanged. The results obtained in this case are shown in Figure 8b, where the performance of the pipeline decreases significantly at very low resolutions. For example, at 75 dpi, the pixel-wise error is still low; however, since lines are rendered with a width of just one pixel, device and line detection show a strong decrease in accuracy. Although 75 dpi is not a realistic resolution for most practical applications, since this corresponds to a line width of one pixel, we deliberately included these low-resolution cases to demonstrate the limits and stress-test the robustness of our pipeline under challenging conditions. Note that by simply synthetically increasing the resolution of such images before applying the U-net-based preprocessing, the quality of the result can be increased.
Impact of gap size. Finally, the impact of gap size in dashed lines was studied. The validation image was modified to include dashed lines, where each line had a straight segment of 0.15 cm and gaps of size $δ$ cm for some $δ > 0$ , with a probability of 0.5 for any given line being dashed. The parameter $δ$ was varied within the range $[0 cm, 0.7 cm]$ , see Figure 8c for the results obtained in this case. Note that the green line, corresponding to device detection, is omitted here because, as expected, variations in gap size do not have any notable effect on pattern matching quality. It can be observed that until the gap sizes are less than four times the length of the straight segment, the line detection quality remains unaffected.

4.3. Similarity of Graphs

To obtain a quality metric that evaluates more than just the local detection of individual wires and devices, we will consider the similarity of graphs

G, G^{'}

extracted from the circuit diagram (see Section 3.4), once by hand and once by the presented pipeline. Based on these graph representations, we compute graph similarity measures that capture not only local similarities, such as the number of correctly detected devices and wires, but also global aspects, including correctly detected connections between wires and devices (detected by the correlation coefficient-based method) as well as the overall connectivity. Formal definitions of these measures have been provided in Section 3.4. For node comparison, similar criteria as described above were used. Two device nodes are considered equal if the center points of the bounding boxes differ by no more than 10 pixel with respect to the maximum metric. Two wire nodes are considered equal if they share a line, where lines are regarded as equal if the start and endpoint match within a tolerance of 10 pixels as above.

Figure 9 visualizes graph similarities obtained for the validation image under different levels of added noise (see Figure 8a). The node precision

N_{p} (G, G^{'})

introduced in Equation (10), which represents the fraction of detected wires and devices in

G^{'}

that are also present in the ground truth graph G compared to the number of detected wires and devices, remains very high even for noisy images. This indicates that the proposed pipeline rarely detects nonexistent devices or wires. To be more precise, false positives in line detection can happen frequently if the noise level is high, but this does not impact the resulting graph, since we can delete isolated wire nodes without losing information. For the node recall

N_{r} (G, G^{'})

given in Equation (10), it turns out that

0.9 < N_{r} (G, G^{'}) < 0.98

is slightly lower but still high. Recall that

N_{r} (G, G^{'})

reflects the ratio of correctly detected wires and devices in

G^{'}

compared to those in the ground truth graph G and is directly related to the blue and green lines in Figure 8a.

Edge precision

E_{p} (G, G^{'})

and edge recall

E_{r} (G, G^{'})

, introduced in Equation (11) and visualized in Figure 9a in green and red, respectively, measure the accuracy of detected connections rather than individual wires and devices. Considering these similarity measures is crucial because they give insight into the performance of the connection classifier, as there would be missing graph connections if a wire node, i.e., a connected component of lines, is not fully detected. The results shown in Figure 9a indicate that edge precision

E_{p} (G, G^{'})

is extremely high, which means that false connections are rarely detected. However, edge recall

E_{r} (G, G^{'})

, which is the fraction of ground truth edges correctly identified, declines significantly in the presence of extreme noise. Nevertheless, for reasonable noise levels, the edge recall

E_{r} (G, G^{'})

remains above 98%.

Figure 9b illustrates the normalized graph edit distance

GED (G, G^{'})

given in Equation (9), representing the number of modifications required to transform a graph obtained from the proposed pipeline into the corresponding ground truth graph. This measure effectively summarizes the results shown in Figure 9a. The distribution of values of

GED (G, G^{'})

reveals that for low noise levels, extreme outliers are absent, demonstrating consistent performance.

Finally, Figure 9c shows the relative number

NC (G, G^{'})

of connected components of the extracted graph

G^{'}

and the ground truth graph G, based on the noise added. Specifically,

NC (G, G^{'})

is the normalized number of connected components in

G^{'}

, indicating how much more components

G^{'}

has compared to G. Note that in all but one case,

G^{'}

has more connected components than G. For circuit diagrams with less than

9 %

noise, the number of connected components in

G^{'}

generally matches that of G. However, at extreme noise in the validation image, connectivity in

G^{'}

decreases significantly, leading to more disconnected components.

In summary, the proposed pipeline performs well, not only on clean circuit diagrams, such as those directly extracted from PDFs, with an accuracy exceeding

95 %

across all metrics, but also on images with reasonable distortion levels. Despite being trained only on local features, such as individual noise, wires, and devices, the pipeline achieves a high precision even with respect to global similarity measures like connectivity.

4.4. Validation on Scanned Images

While the primary motivation for using pixel-based input data in the pipeline was to address inconsistencies in PDF formats, this approach also enables direct application to scanned image data, where no vector graphic information is available. This section briefly evaluates the quality of the pipeline’s output when applied to such scanned data. For this reason, the diagram shown in Figure 6 was manually printed and scanned. In order to simulate challenges arising from misaligned scans, a deliberately skewed scan was performed; see Figure 10a. The circuit diagram was printed using a Brother DCP-L2520DW printer (Brother Industries, Nagoya, Japan) and scanned at 300 dpi with the same device, matching the resolution used for the pattern-matching templates. Scanning was carried out using the open-source software Document Scanner (v46.0), which relies on the standard SANE driver under Linux. A digital deskewing step was applied to the scan using [41]. It is worth noting that such preprocessing is often already integrated into many scanning tools. Deskewing is particularly important for the proposed line and device detection methods, which rely on morphological operations and template matching. The result of the deskewing is depicted in Figure 10b. Figure 10c,d shows the result of applying the U-net-based preprocessing to the scanned image. Note that the U-net, which was originally trained on binary input data, is here applied directly to the more general case of grayscale image data, i.e., images, where the pixels can take values in

[0, 1]

instead of

{0, 1}

. Despite this, the network successfully replaces all dashed lines with fully connected ones. All lines in the test diagram were correctly detected, and the intersection classifier produced only two misclassification (see Figure 10c, green and pink markers and red and blue next to the transistor). For device detection, 88% of the devices present in the test diagram were correctly identified, with no false positives. The three missed detections were caused by mismatches in scale and shape between the templates and the scanned image. This issue can be mitigated by adjusting the dilation applied to both the templates and the input image. For example, increasing the dilation kernel to

(7, 7)

for templates with a side length greater than 50 pixels, and reducing it to a

(2, 2)

kernel for images with a side length below 20 pixels, leads to the improved identification results shown in Figure 10e.

5. Discussion

The results achieved by the U-net preprocessing demonstrate its effectiveness in removing noise and text, as well as in identifying dashed lines. This approach performs very well on images rendered from PDFs and maintains high quality even when applied to grayscale scanned images, despite being trained exclusively on rendered binary images.

The line detection method presented, based on morphological operations, has proven suitable for the considered data, where wires are represented solely by vertical and horizontal lines. It delivers good results as long as the lines are wider than one pixel. Due to the almost negligible number of false positive detections, this simple and fast line detection method may be preferable to more sophisticated approaches that detect curved or diagonal lines but probably generate more false positives. However, this restricts the proposed pipeline to data consisting of horizontal and vertical lines, or otherwise necessitates the integration of alternative techniques, such as those described in [37,38,39].

Both device detection methods yielded satisfactory detection performance across all considered scales in the presence of low noise. However, under extreme noise conditions, where device patterns are no longer pixel-perfect, the R-CNN-based approach significantly outperforms the correlation coefficient-based method in terms of detected devices. This improvement comes at a cost: while the correlation-based method is inherently resistant to false positives due to its design, the R-CNN-based method does produce some. Therefore, the choice between the two methods should be guided by the specific requirements of the application, particularly in terms of tolerance for false positives versus false negatives.

The evaluation on real scanned image data further supports the robustness of both the line and device detection methods. Despite the presence of global rotations in the scans, reliable detection performance was achieved. This can be attributed to the application of deskewing techniques, which effectively corrected even substantial initial misalignments. These results highlight the practicality of the approach in real-world scenarios, where imperfect scan alignment is common and must be addressed.

6. Conclusions and Outlook

A pipeline for automatically extracting graph representations of connected components from circuit diagrams is proposed. The pipeline integrates image processing, pattern-matching-based device detection, and morphological operation-based line detection. The method focuses on interpretable methods, especially suitable for fast and large-scale applications. The pipeline shows a very high performance when applied to rendered PDF data of circuit diagrams. Even when applied to scanned circuit diagrams, 88% of the devices were detected correctly, and the line detection achieved a accuracy of 94%. However, the approach, particularly the pattern-matching component, has limitations in handling rotated devices. It has been shown that these issues can be partially mitigated by applying deskewing software, provided the rotations result from a uniform rotation of the entire diagram. Future work will include exploring an alternative to the current line detection method where, instead of relying on morphological operations, a more rotation-invariant approach is developed while still maintaining a low fraction of false positive detections. With the pipeline in place, we also plan to benchmark on real-world industrial circuit datasets by expanding our collection of scanned diagrams and investing in annotation efforts to address the current reliance on synthetic distortions for evaluation.

Author Contributions

Conceptualization, L.F., M.D., M.W., A.R., J.F. and V.S.; methodology, L.F., M.D., M.W. and A.R.; software, L.F. and M.D.; validation, L.F., M.D., M.W. and A.R.; formal analysis, L.F., M.D., M.W. and A.R.; investigation, L.F., M.D., M.W. and A.R.; resources, J.F. and V.S.; data curation, M.D.; writing—original draft preparation, L.F., M.D., M.W. and A.R.; writing—review and editing, L.F., M.D., M.W., A.R., J.F. and V.S.; visualization, L.F. and M.D.; supervision, M.W., J.F. and V.S.; project administration, J.F. and V.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon reasonable request.

Conflicts of Interest

Authors M.D., A.R. and J.F. were employed by the company Pragmatic Minds GmbH. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Larick, S. Functional schematic diagrams. Proc. IRE 1946, 34, 1005–1007. [Google Scholar] [CrossRef]
IEEE Std 315-1975 (Reaffirmed 1993); IEEE Standard for Graphic Symbols for Electrical and Electronics Diagrams (Including Reference Designation Letters). IEEE-SASB Coordinating Committees: Piscataway, NJ, USA, 1975; pp. 1–176. Available online: https://ieeexplore.ieee.org/document/8996120 (accessed on 16 July 2025).
Garland, D.J.; Stainer, F.W. Modern Electronic Maintenance Principles; Elsevier: Amsterdam, The Netherlands, 2016. [Google Scholar]
IEC 60617-DB; Graphical Symbols for Diagrams. International Electrotechnical Commission: Geneva, Switzerland, 2025.
Mani, S.; Haddad, M.A.; Constantini, D.; Douhard, W.; Li, Q.; Poirier, L. Automatic digitization of engineering diagrams using deep learning and graph search. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 673–679. [Google Scholar]
Li, X.; Liu, X. Optimizing parameter extraction in grid information models based on improved convolutional neural networks. Electronics 2024, 13, 2717. [Google Scholar] [CrossRef]
Kelly, C.R.; Cole, J.M. Digitizing images of electrical-circuit schematics. APL Mach. Learn. 2024, 2, 016109. [Google Scholar] [CrossRef]
Bayer, J.; Roy, A.K.; Dengel, A. Instance segmentation based graph extraction for handwritten circuit diagram images. arXiv 2023, arXiv:2301.03155. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Du, G.; Cao, X.; Liang, J.; Chen, X.; Zhan, Y. Medical image segmentation based on U-net: A review. J. Imaging Sci. Technol. 2020, 64, 0710. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Apostolico, A.; Galil, Z. Pattern Matching Algorithms; Oxford University Press: Oxford, UK, 1997. [Google Scholar]
Diestel, R. Graph Theory; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
Open Source Hardware Park. Available online: https://oshpark.com/shared_projects/ (accessed on 16 July 2025).
Charras, J.P.; Tappero, F.; Stambaugh, W. KiCad Complete Reference Manual; 12th Media Services: Suwanee, GA, USA, 2018. [Google Scholar]
Zhang, Q.; Huang, V.S.J.; Wang, B.; Zhang, J.; Wang, Z.; Liang, H.; Wang, S.; Lin, M.; He, C.; Zhang, W. Document parsing unveiled: Techniques, challenges, and prospects for structured information extraction. arXiv 2024, arXiv:2410.21169. [Google Scholar]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual U-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Furat, O.; Kirstein, T.; Leißner, T.; Bachmann, K.; Gutzmer, J.; Peuker, U.A.; Schmidt, V. Multidimensional characterization of particle morphology and mineralogical composition using CT data and R-vine copulas. Miner. Eng. 2024, 206, 108520. [Google Scholar] [CrossRef]
Alomar, K.; Aysel, H.I.; Cai, X. Data augmentation in classification and segmentation: A survey and new strategies. J. Imaging 2023, 9, 46. [Google Scholar] [CrossRef] [PubMed]
Bock, S.; Goppold, J.; Weiß, M. An improvement of the convergence proof of the ADAM-Optimizer. arXiv 2018, arXiv:1804.10587. [Google Scholar]
Brunelli, R. Template Matching Techniques in Computer Vision: Theory and Practice; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Basu, J.K.; Bhattacharyya, D.; Kim, T. Use of artificial neural network in pattern recognition. Int. J. Softw. Eng. Appl. 2010, 4, 23–34. Available online: https://khaneprozhe.ir/wp-content/uploads/2024/02/Use_of_Artificial_Neural_Network_in_Pattern_Recogn.pdf (accessed on 16 July 2025).
ANSI/IEEE Std 315A-1986; American National Standard—Supplement to Graphic Symbols for Electrical and Electronics Diagrams. IEEE: New York, NY, USA, 1986; pp. 1–64. [CrossRef]
Woods, R.E.; Gonzalez, R.C. Digital Image Processing, 3rd ed.; Prentice Hall: Hoboken, NJ, USA, 2021. [Google Scholar]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 25, 120–125. [Google Scholar]
Kim, H.Y.; de Araújo, S.A. Grayscale template-matching invariant to rotation, scale, translation, brightness and contrast. In Advances in Image and Video Technology; Mery, D., Rueda, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 100–113. [Google Scholar]
Kim, J.; Kim, J.; Choi, S.; Hasan, M.A.; Kim, C. Robust template matching using scale-adaptive deep convolutional features. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 708–711. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
KiCad Symbols. Available online: https://gitlab.com/kicad/libraries/kicad-symbols/ (accessed on 18 July 2025).
Dougherty, E. Mathematical Morphology in Image Processing; CRC Press: Boca Raton, FL, USA, 1992. [Google Scholar]
Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 18 July 2025).
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Szeliski, R. Computer Bision: Algorithms and Applications; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Lee, T.C.; Kashyap, R.L.; Chu, C.N. Building skeleton models via 3-D medial surface/axis thinning algorithms. CVGIP Graph. Model. Image Process. 1994, 56, 462–478. [Google Scholar] [CrossRef]
Zhang, T.Y.; Suen, C.Y. A fast parallel algorithm for thinning digital patterns. Commun. ACM 1984, 27, 236–239. [Google Scholar] [CrossRef]
Goodfellow, I. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
4lex4. ScanTailor Advanced. 2015. Available online: https://github.com/4lex4/scantailor-advanced (accessed on 3 July 2025).

Figure 1. Overview of the pipeline: From raw circuit diagram image to graph-based representation of circuit diagram.

Figure 2. Exemplary training data. (a) Exemplary image of the training data without added noise and no applied rotation (5104 × 3276 pixels). (b) Exemplary image of the training data including noise and rotation (6618 × 2555 pixels).

Figure 3. Exemplary types of dashed lines. (a) Straight dashed line, (b) dashed line with corner, (c) dashed line with gap at corner, and (d) dashed junction with gap at junction.

Figure 4. Examples of elementary circuit diagram devices as per IEEE Standard 315.

Figure 5. Different variants of junctions between wires. Left: the first three images show so-called T-junctions, representing connected wires, using either a straight intersection (without or with dot), or a diagonal connection. For line crossings, two different notations are present in the image data. Middle: plain crossing of unconnected wires (without dot). Right: crossing of connected wires (with dot).

Figure 6. Pipeline overview. (a) Raw circuit diagram image x, (b) corresponding clean ground truth image y, (c) predicted clean circuit diagram

\tilde{y}

produced by the U-net preprocessing step of Section 3.1, with highlighted devices detected by the template matching algorithm of Section 3.2, and (d) same prediction

\tilde{y}

with highlighted wires, identified using the line detection algorithm of Section 3.3. Lines detected as being connected are colored in the same way.

Figure 6. Pipeline overview. (a) Raw circuit diagram image x, (b) corresponding clean ground truth image y, (c) predicted clean circuit diagram

\tilde{y}

produced by the U-net preprocessing step of Section 3.1, with highlighted devices detected by the template matching algorithm of Section 3.2, and (d) same prediction

\tilde{y}

with highlighted wires, identified using the line detection algorithm of Section 3.3. Lines detected as being connected are colored in the same way.

Figure 7. Precision per pixel class. The quality of the U-net preprocessing is assessed with respect to black pixels that should be set to white (bw), black pixels that should remain unchanged (bb), white pixels that should be set to black (wb), and white background pixels that should not be changed (ww). The classification of black pixels that should be classified as white pixels is further distinguished into pixels belonging to noise (bw: noise) and pixels belonging to text (bw: text). Note that the box plots shown in this figure correspond to results achieved on validation data not used for network training.

Figure 8. Distortion analysis. The influence of different magnitudes of noise (a), resolutions (b), and gap sizes in dashed lines (c) are shown for the pixel-wise classification (orange), the detection of lines (blue) and the detection of electrical devices based on the Pearson pattern matching (green) and R-CNN (red). Since the gap size of lines does not influence the detection of devices, the respective curves are neglected in (c).

Figure 9. Graph similarity measures. Quantitative similarity of graphs G extracted by hand from the validation circuit diagram and graphs

G^{'}

extracted by the proposed pipeline in dependence of the noise added to the image. The similarity is quantified by means of mean node/edge recall/precision (a), normalized graph edit distance (b), and the relative number of connected components (c). The underlying data corresponds to the validation data of Figure 8a.

Figure 9. Graph similarity measures. Quantitative similarity of graphs G extracted by hand from the validation circuit diagram and graphs

G^{'}

extracted by the proposed pipeline in dependence of the noise added to the image. The similarity is quantified by means of mean node/edge recall/precision (a), normalized graph edit distance (b), and the relative number of connected components (c). The underlying data corresponds to the validation data of Figure 8a.

Figure 10. Application of the processing pipeline to scanned circuit diagrams. (a) shows a heavily rotated section of a 300 dpi scan. (b) depicts the aligned version of (a), generated by [41]. (c) provides a close-up of (b), highlighting scanning artifacts. (d) presents the result of applying the U-net-based preprocessing to (c). (e) displays the outcome of template matching applied to the cleaned version of (b). (f) shows the line detection applied to a cleaned version of (b). Connected lines are displayed in the same color.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fuchs, L.; Diesse, M.; Weber, M.; Rasim, A.; Feinauer, J.; Schmidt, V. Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams. Electronics 2025, 14, 2889. https://doi.org/10.3390/electronics14142889

AMA Style

Fuchs L, Diesse M, Weber M, Rasim A, Feinauer J, Schmidt V. Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams. Electronics. 2025; 14(14):2889. https://doi.org/10.3390/electronics14142889

Chicago/Turabian Style

Fuchs, Lukas, Marc Diesse, Matthias Weber, Arif Rasim, Julian Feinauer, and Volker Schmidt. 2025. "Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams" Electronics 14, no. 14: 2889. https://doi.org/10.3390/electronics14142889

APA Style

Fuchs, L., Diesse, M., Weber, M., Rasim, A., Feinauer, J., & Schmidt, V. (2025). Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams. Electronics, 14(14), 2889. https://doi.org/10.3390/electronics14142889

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Convolutional Neural Networks and Pattern Matching for Digitization of Printed Circuit Diagrams

Abstract

1. Introduction

2. Description of Data

3. Methods

3.1. Text Removal and Dashed Line Connection

3.2. Object Detection

3.3. Line Detection and Clustering

3.4. Connectivity Graph

4. Results

4.1. Quality of the U-Net Output

4.2. Synthetic Distortion of Input Images

4.3. Similarity of Graphs

4.4. Validation on Scanned Images

5. Discussion

6. Conclusions and Outlook

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI