Research on an Automatic Solution Method for Plane Frames Based on Computer Vision
Highlights
- Proposed a structured reconstruction method to bridge visual semantics and mechanics, converting image recognition data into precise inputs for the matrix displacement method.
- Established a novel rapid analysis method based on visual perception that integrates deep learning with traditional mechanics to automatically generate internal force diagrams.
- This approach automates the workflow from image input to structural solving, significantly reducing analysis time to seconds by avoiding complex manual modeling.
- It provides a new technical path for intelligent structural analysis, proving highly effective for teaching demonstrations and quick engineering estimations.
Abstract
1. Introduction
2. Related Work
2.1. Intelligent Recognition of Engineering Images
2.2. Automated Structural Mechanics Analysis
3. Methodology
3.1. Image Acquisition
3.2. Image Pre-Processing
3.2.1. Interactive ROI Extraction and Geometric Distortion Rectification
3.2.2. Image Denoising and Binarization
- 1.
- Grayscale Conversion and Gaussian Filtering: First, the RGB color image obtained after perspective transformation is converted into a grayscale image using the formula:To suppress high-frequency noise and smooth details in the image, Gaussian filtering is applied to the grayscale image for denoising. The Gaussian filter is a linear smoothing filter whose convolution kernel weights follow a two-dimensional Gaussian distribution [31]:where is the standard deviation, controlling the degree of smoothing. In this section, a 3 × 3 Gaussian kernel is used with a standard deviation (automatically calculated by the OpenCV library based on the kernel size), which effectively removes noise while preserving edge information.
- 2.
- Adaptive Binarization: To convert the image into a black-and-white binary image to highlight the contours of the frame structure, Otsu’s adaptive thresholding method is adopted. Otsu’s method assumes that the image consists of two classes of pixels: foreground and background. By iterating through possible thresholds in the image, the inter-class variance is calculated for each; a larger inter-class variance indicates a more distinct grayscale difference between the two classes and a better segmentation effect. Therefore, Otsu’s method automatically determines the optimal threshold T* by maximizing the inter-class variance [32]:where and are the average gray levels of the background and foreground pixels, respectively; and are the proportions of background and foreground pixels; and is the global average gray level. After determining the optimal threshold T*, the inverted binarization mode (THRESH_BINARY_INV) is employed, setting pixels with values greater than the threshold T* as the white foreground and those less than T* as the black background.
- 3.
- Morphological Processing: The binarized image may still contain defects such as isolated noise points and broken lines, necessitating repair through morphological operations. Therefore, a combined strategy of closing and opening operations is adopted [33]:Closing Operation: Dilation followed by erosion, used to fill small holes in the image and connect broken lines:Opening Operation: Erosion followed by dilation, used to remove small isolated noise points:where A is the input image, B is the structuring element, and and denote the dilation and erosion operations, respectively. This section employs a 3 × 3 rectangular structuring element, first executing the closing operation to fill lines, and then the opening operation to remove isolated noise. To eliminate the influence of boundary noise, a 2-pixel wide black border is drawn on the image edges after morphological processing. Finally, the processed result is inverted to render the frame structure as black lines against a white background, yielding a clear binarized frame image that provides high-quality input for subsequent object detection.
3.3. Object Detection Based on YOLOv8
3.3.1. Dataset Construction
3.3.2. Model Training and Parameter Configuration
3.3.3. Detection Workflow and Metric Analysis
- 1.
- Global Preliminary Screening: The raw frame images are uniformly rescaled to a resolution of 1024 × 1024 and input into four detection models: “Node_area”, “Dimension”, “Load”, and “Value”. This step aims to rapidly locate the spatial coordinates of four types of primitives: node regions, dimension symbols, load regions, and numerical value regions.
- 2.
- Region Cropping: The detected node regions and load regions are cropped using image processing algorithms and rescaled to a resolution of 640 × 640 for input into subsequent detection models. Simultaneously, the support node segments within the node regions are filtered out and rescaled to a resolution of 320 × 320 for input into the subsequent classification model.
- 3.
- Detailed Fine-grained Classification: The cropped support node images are input into the “Support_Cls” model to determine the support type and mechanical boundary conditions. The node region images are input into the “Node” model to obtain precise node coordinates. The load region images are input into the “Load Vector” model to identify arrows for determining the direction of the loads.
- High Precision: With the exception of the support classification model, the mAP@0.5 for all detection models exceeded 0.98. This is primarily attributed to two factors: First, the processed plane frame images exhibit distinct features and clean backgrounds. Second, the multiple ROI cropping method employed in this study effectively eliminated irrelevant features, enabling subsequent models to focus on local feature extraction.
- High Recall: The recall rates for all detection models were above 0.99. Such an exceptionally high recall rate ensures the reliability of structural calculations, as in the task of extracting structural mechanics parameters, the omission of data is far more critical than false detections.
- Accurate Classification: The “Support_Cls” model achieved a classification accuracy of 97.2%, ensuring the establishment of correct boundary conditions for the frame. Meanwhile, the “Load Vector” model achieved both P = 1 and R = 1, perfectly resolving the issue of load direction determination.
- Inference Efficiency: In terms of inference efficiency, the inference time for macro-models processing large-resolution inputs was controlled at approximately 28 ms, whereas models processing local small images (such as “Support_Cls”) required only 4 ms. The inference speed satisfies the requirements for real-time processing.
3.4. Frame Data Recognition
- Basic Topology and Material Parameters: Total number of frame nodes, total number of elements, number of loaded nodes, number of support nodes, and material elastic modulus;
- Nodal Geometric Information: Node ID, X-coordinate, and Y-coordinate;
- Element Connectivity and Sectional Properties: Element ID, node IDs at both ends of the element, cross-sectional area, and moment of inertia;
- Nodal Load Vectors: Node ID, load in the X-direction, load in the Y-direction, and bending moment;
- Boundary Conditions: Support node ID, indicator information regarding the displacement status of the node in the X, Y, and rotational directions, and the known displacement values in these three directions.
3.4.1. Node Localization and Topological Reconstruction
- 1.
- Two-Stage Node Detection. A coarse-to-fine detection strategy is adopted. First, a regional object detection model is utilized to locate node regions (ROI) within the overall frame. Subsequently, a fine-grained detection model is employed to precisely locate the node centers within the cropped regions, thereby eliminating background noise interference and obtaining the pixel coordinate set N of the nodes:where is the node ID; and are the node pixel coordinates.
- 2.
- Semantic Parsing of Dimensions. To establish the mapping between pixels and physical lengths, the study obtains positioning information by detecting the endpoints of dimension lines and acquires annotation values through Optical Character Recognition (OCR), constructing a dimension dataset D:where and are the pixel coordinates of the start and end points of the dimension; is the dimension value; and is the direction indicator factor (0 for horizontal dimensions, 1 for vertical dimensions).
- 3.
- Node-Dimension Spatial Association. Since a single dimension annotation in an image often spatially corresponds to multiple nodes, association via coordinate projection is required. Therefore, the dimension endpoints are matched with structural nodes in the horizontal and vertical directions, respectively. A tolerance threshold δ is set. For horizontal dimensions () nodes in set N satisfying are sought as start-point associated nodes; end-point associated nodes are matched similarly. Vertical dimensions follow the same logic. A node-dimension association array R is constructed:where and are the pixel coordinates of the start and end points of the dimension; is the dimension value; and is the direction indicator factor (0 for horizontal dimensions, 1 for vertical dimensions); and are the start and end node IDs matched in the X-direction; and and are the start and end node IDs matched in the Y-direction.
- 4.
- Coordinate Calculation Based on Dimension Chains and Proportions. After obtaining the association array R, a hybrid solution method of “dimension chain recurrence + proportional interpolation” is adopted to calculate the actual node coordinates. First, Base Establishment and Recurrence: The bottom-left node is set as the origin (0, 0). The association array R is traversed; if node i is known and connected to node j by dimension D, the coordinates of j are deduced. Additionally, Proportional Interpolation Solution: For special nodes not directly annotated by dimensions (such as the mid-point load position, as shown in Figure 5), utilizing the approximation principle of affine invariance in perspective projection, the actual physical coordinates are calculated via interpolation based on the proportional pixel distance between the point and adjacent known nodes on the image.
- 5.
- Element Connectivity Identification and Topology Construction. Upon completing the solution for actual node coordinates, the element connection relationships must be determined. All node pairs (i, j) are traversed, and a detection band with a width of 5 pixels is constructed along the line connecting the two nodes. The proportion of effective structural pixels within the band is calculated; if it exceeds a set threshold, the node pair is judged to be connected, constituting a structural element. Following the “Minimum Element Principle,” if three or more nodes are collinear, they are segmented into multiple continuous elements, ultimately generating an element connection list.
3.4.2. Load Recognition and Numerical Association
- 1.
- Recognition of Concentrated Forces and Nodal Association. Concentrated forces typically appear as straight arrows pointing towards nodes in drawings. First, the load detection model is used to detect and locate the overall region of the concentrated force, obtaining its center point . The load arrow detection model is then employed to detect arrow features and obtain the arrow center point . A direction vector pointing from P1 to P2 is constructed. By analyzing the signs of the components of υ on the X and Y axes, the direction of action of the concentrated force in the global coordinate system is determined, thereby establishing the sign of the load. Second, OCR character recognition is performed on the load annotation region. Combined with regular expression processing, the numerical magnitude is extracted, and the final sign of the concentrated force is determined based on the direction information. Finally, based on the nearest-neighbor principle, the arrow center point P2 is matched against the set of node pixel coordinates, and the node with the minimum distance is judged as the point of application for the concentrated force.
- 2.
- Recognition of Bending Moments and Nodal Association. Bending moments are usually represented by curved arrows, where the core challenge lies in determining the direction of rotation (clockwise/counter-clockwise). First, the load detection model locates the overall center point of the bending moment. The load arrow detection model detects the bending moment arrow, and edge detection algorithms are used to extract the arrow’s contour. The intersection of the two main boundary lines is calculated as the arrow tip point , and the midpoint of the line connecting the tail endpoints of the boundary lines is taken as the arrow tail point . Based on the vector cross product principle to determine the rotation direction, vectors and are constructed to calculate the 2D vector cross product . The sign of θ determines whether the moment is counter-clockwise or clockwise, thereby establishing the sign of the moment value. The processes for numerical extraction and matching the point of application (spatial proximity principle) remain consistent with those for concentrated forces.
- 3.
- Recognition of Distributed Loads and Equivalent Conversion. Distributed loads appear as arrays of arrows distributed along structural members. The processing difficulty lies in the mechanical conversion from “element loads” to “nodal loads.” The direction of the distributed load can be determined by extracting just a single prominent arrow within the array; its numerical information is extracted via OCR technology and matched with load primitives using the spatial proximity principle. The objects acted upon by distributed loads are structural elements rather than nodes. In the Matrix Displacement Method, non-nodal loads cannot be directly incorporated into the nodal equilibrium equations. Therefore, by spatially matching the distributed load region with identified elements, the corresponding element is determined. Then, based on the element length, boundary conditions, and load magnitude identified by the program, the fixed-end reaction forces generated at both ends of the element are calculated. These forces are inverted in sign to be converted into equivalent nodal loads and superimposed onto the load array of the corresponding nodes, thus completing the final assembly of the computational model.
3.4.3. Support Classification and Boundary Extraction
- 1.
- State Indicator Code (α): Used to mark the unknown state of a DOF. indicates that the displacement in this direction is unknown and must be solved via equations; indicates that the displacement in this direction is known.
- 2.
- Known Displacement Value (u): When , a specific forced displacement value is entered; when this value is invalid (defaulting to 0.0).
3.5. Static Analysis Calculation Program
3.5.1. Program Design and Algorithm Principles
- Half-Bandwidth Storage: Utilizing matrix symmetry, only the half-bandwidth elements of the global stiffness matrix are stored, effectively reducing memory usage.
- Variable-Bandwidth Optimized Solver: A linear equation solver optimized for the storage structure is designed. Adopting an improved algorithm based on Gaussian elimination, elimination and back-substitution operations are performed only on elements within the effective bandwidth, thereby enhancing solution efficiency.
3.5.2. Calculation Workflow
- Data Parsing and Pre-processing: The program reads the standardized structural data file generated in Section 3.4 and parses the topological relationships of the frame. It initializes the total number of nodes, total number of elements, and material properties (Elastic Modulus E, Moment of Inertia I, etc.), and dynamically allocates the memory space required for calculation based on these parameters.
- Global Stiffness Matrix Assembly: Iterating through all elements, the program calculates the element stiffness matrix in the local coordinate system. It then utilizes the coordinate transformation matrix to convert it to the global coordinate system and superimposes it onto the global stiffness matrix according to node IDs.
- Application of Boundary Conditions: Based on the support types, the “Set-to-1 method” is employed to modify the global stiffness matrix and load vector, thereby eliminating rigid body displacements.
- Equation Solving: The optimized solver is invoked to solve the processed system of linear equations, obtaining the generalized displacement vectors of all nodes in the global coordinate system.
- Member End Force Calculation: Utilizing the obtained nodal displacements, the member end internal forces of each element in the local coordinate system are calculated via back-substitution.
- Result Output: The calculated nodal displacements, support reactions, and element internal forces are output in a structured format, serving as the data source for the subsequent automatic plotting of bending moment, shear force, and axial force diagrams.
3.6. Visualization of Internal Force Diagrams
4. Case Study and Validation
4.1. Validation with Structural Mechanics Exercise
- Global Object Detection: The YOLOv8 model successfully located all key elements within the frame. As shown in Figure 8, the model accurately detected 10 node regions, 4 concentrated loads, 2 distributed loads, and all dimension endpoints and numerical values, with no missed detections or false positives.
- 2.
- Local Refinement Detection and Support Classification (Figure 9): For the detected node regions (Figure 8a), cropping and secondary recognition were performed to output the pixel coordinates of the nodes. For the loads (Figure 8b), secondary detection was conducted to correctly judge the direction of action for both concentrated and distributed loads based on arrow direction. For support node regions, the classification model correctly identified them as “Fixed Supports,” determining the boundary conditions at the bottom of the frame to be fully constrained.
4.2. Analysis of Double-Span Portal Frame
- 1.
- Geometric Simplification: The centerlines of beam and column members were extracted to construct a wireframe model, and the column base supports were uniformly simplified as fixed supports. In drawing the simplified structural schematic, the proportional relationship between geometric line segment lengths and annotated numerical values was intentionally dissociated. That is, the ratio between pixel lengths in the diagram does not correspond to the ratio between the dimension annotations. This ensures that even in cases of scale disproportion or deformation in the structural diagram, the calculation results remain precise provided the annotations are correct.
- 2.
- Load Simplification: The characteristic value of the dead load (including member self-weight) is 0.5 kN/m2 and the characteristic value of the roof live load (non-accessible) is 0.5 kN/m2. The frame column spacing is B = 8 m, and the canopy cantilever width is LC = 4 m. Therefore, the design value of the area load is calculated as:The area load is converted into a line load:Rounding to an integer, q = 12 kN/m is applied vertically downward on the pitched beams. The canopy load is equivalently converted into a concentrated force P and a concentrated moment M:In the simplified structural schematic, a vertical concentrated force P = 48 kN and a corresponding nodal moment M = 96 kN∙m (counter-clockwise for the left column, clockwise for the right column) are applied to the column tops at Axis A and Axis C, respectively.
4.3. Comparative Analysis of Comprehensive Efficiency and Performance
- Traditional Manual Solution: This relies on manual geometric modeling, derivation of equilibrium equations, and numerical calculation. For statically indeterminate structures such as the double-span portal frame, a skilled engineer or student typically requires 20–30 min to complete the full calculation and plotting, and the process is highly prone to calculation errors.
- Commercial Software (SAP2000 v21): Although its numerical solution core requires only milliseconds, its “human–computer interaction cost” is extremely high. Users must undergo cumbersome pre-processing steps involving defining grids, drawing members, assigning cross-sections, applying loads, and setting boundaries. For non-batch simple frame problems, the entire modeling process for a skilled operator typically requires 10–15 min.
| Method Type | Avg. Total Time | Operational Complexity | Main Bottleneck | Usage Scenarios |
|---|---|---|---|---|
| Traditional Manual Calculation | 20–30 min | High (Error-prone) | Tedious matrix operations and equation derivation | Theoretical exams, learning basic principles |
| SAP2000 Modeling | 10–15 min | Medium (High entry barrier) | Complex GUI interaction and parameter settings | Large-scale complex structures, detailed design |
| Proposed Method | 20–30 s | Low (Automated) | Dependence on image clarity and shooting angle | Homework grading, rapid on-site verification |
5. Discussion
5.1. Error Propagation Analysis and Robustness Mechanism
- Node Alignment: Statistical analysis indicates that the Euclidean distance between distinct nodes in standard problem sets typically exceeds 100 pixels, while the detection jitter deviation of collinear nodes is generally within 10 pixels. Therefore, this paper sets a tolerance threshold of 30 pixels to align nodes along horizontal or vertical axes. This threshold setting is sufficient to cover detection noise yet far smaller than the minimum physical spacing between nodes, thereby eliminating coordinate drift during the reconstruction phase.
- Dimension Association: The mapping between dimension values and nodes follows the same spatial tolerance logic described above. By constructing a spatial adjacency matrix, the system ensures that dimensional parameters are accurately assigned to the corresponding geometric primitives, avoiding parameter matching errors caused by minor visual positional offsets.
- Topology Verification: To ensure the accuracy of member connections, pixel density detection (with a threshold set at 80%) is introduced between node pairs. This strategy guarantees that member elements are generated only when high-confidence connecting lines exist, effectively filtering out false topological connections caused by visual false detections.
5.2. Analysis of Generalization, Scalability, and Efficiency
5.3. Limitations and Future Improvement Directions
- The recognition accuracy for complex and dense load conditions needs improvement. The current detection model primarily targets single or sparsely distributed standard load forms. When confronting complex combined loads or high-density load distributions, severe pixel overlap among primitives renders the Non-Maximum Suppression (NMS) mechanism of the detection model susceptible to missed detections or false positives, leading to either missing or redundant load information in the mechanical model.
- Absence of analysis capabilities for non-load factors. The current program logic strictly addresses static responses under the action of “external force loads.” For internal force changes induced by non-load factors—such as support displacement (prescribed displacement), temperature changes, or fabrication errors—the system is currently unable to calculate structural internal forces, as the present dataset does not encompass the relevant annotation symbols for these conditions.
- Limitations in generalization for hand-drawn sketches and precision of text parameter extraction. The current detection model was trained primarily on printed samples. When applied to uncontrolled hand-drawn sketches featuring extremely scribbled lines, severe scale distortion, or numerous alteration traces, the robustness of topological relationship recognition requires improvement. Furthermore, the extraction of load values and geometric dimension text currently relies on general-purpose OCR engines. To meet the high-precision requirements of engineering verification, future research needs to further integrate specialized OCR technologies optimized for engineering symbols and combine them with the association algorithms proposed in this paper to realize precise matching between numerical values and components.
6. Conclusions
- Validated the effectiveness of the YOLOv8-based recognition method for plane frame structural primitives. Addressing the issue that existing research lacks recognition of mechanical model components, a multi-category dataset comprising 7030 images was constructed. Experimental results indicate that this method effectively identifies structural primitives on printed plane frame schematics, providing precise data input for structural calculations.
- Proposed a structured reconstruction method from visual semantics to mechanical models. To bridge the “semantic gap” between computer vision data and structural mechanics computational data, a structured data conversion algorithm was designed. This algorithm successfully transforms image recognition results into structured data required by the Matrix Displacement Method, including nodal encoding, load inputs, and boundary conditions.
- Constructed a novel rapid analysis method for plane frames based on visual perception. This paper combines deep learning algorithms with a traditional Matrix Displacement Method program to realize the automation of structural analysis. Experimental validation demonstrates that this method significantly enhances the efficiency of structural mechanics teaching demonstrations and rapid engineering estimation, offering a new technical pathway for intelligent structural analysis.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Joffe, I.; Qian, Y.; Talebi-Kalaleh, M.; Mei, Q. A Computer Vision Framework for Structural Analysis of Hand-Drawn Engineering Sketches. Sensors 2024, 24, 2923. [Google Scholar] [CrossRef]
- Elyan, E.; Jamieson, L.; Ali-Gombe, A. Deep Learning for Symbols Detection and Classification in Engineering Drawings. Neural Netw. 2020, 129, 91–102. [Google Scholar] [CrossRef] [PubMed]
- Villena Toro, J.; Wiberg, A.; Tarkian, M. Optical Character Recognition on Engineering Drawings to Achieve Automation in Production Quality Control. Front. Manuf. Technol. 2023, 3, 1154132. [Google Scholar] [CrossRef]
- SAP2000; Integrated Software for Structural Analysis & Design. Computer and Structures, Inc.: Berkeley, CA, USA, 2023.
- Yuan, S.; Ye, K.; Yuan, Z. Algorithms and performance of ‘Structural Mechanics Solver’—Invited Report at the 10th National Conference on Structural Engineering. Eng. Mech. 2001, A01, 174–181. (In Chinese) [Google Scholar]
- Khan, M.T.; Yong, Z.; Chen, L.; Feng, W.; Tan, N.Y.J.; Moon, S.K. A Multi-Stage Hybrid Framework for Automated Interpretation of Multi-View Engineering Drawings Using Vision Language Model. arXiv 2025, arXiv:2510.21862. [Google Scholar] [CrossRef]
- Khan, M.T.; Yong, Z.; Chen, L.; Tan, J.M.; Feng, W.; Moon, S.K. Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-Tuned Document Understanding Transformer. arXiv 2025, arXiv:2505.01530. [Google Scholar] [CrossRef]
- Barki, H.; Fadli, F.; Shaat, A.; Boguslawski, P.; Mahdjoubi, L. BIM Models Generation from 2D CAD Drawings and 3D Scans: An Analysis of Challenges and Opportunities for AEC Practitioners. Build. Inf. Model. (BIM) Des. Constr. Oper. 2015, 149, 369–380. [Google Scholar]
- Yu, W.; Hsu, J. Content-Based Text Mining Technique for Retrieval of CAD Documents. Autom. Constr. 2013, 31, 65–74. [Google Scholar] [CrossRef]
- Li, L.; Yuhui, C.; Xiaoting, L. Engineering Drawing Recognition Model with Convolutional Neural Network. In Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence, Shanghai, China, 20–22 September 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 112–116. [Google Scholar] [CrossRef]
- Duda, R.O.; Hart, P.E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Commun. ACM 1972, 15, 11–15. [Google Scholar] [CrossRef]
- Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2009, PAMI-8, 679–698. [Google Scholar] [CrossRef]
- Soille, P. Morphological Image Analysis: Principles and Applications; Springer: Berlin/Heidelberg, Germany, 1999; Volume 2. [Google Scholar]
- Hilaire, X.; Tombre, K. Robust and Accurate Vectorization of Line Drawings. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 890–904. [Google Scholar] [CrossRef]
- Rezvanifar, A.; Cote, M.; Albu, A.B. Symbol Spotting on Digital Architectural Floor Plans Using a Deep Learning-Based Framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 568–569. [Google Scholar]
- Gao, W.; Zhao, Y.; Smidts, C. Component Detection in Piping and Instrumentation Diagrams of Nuclear Power Plants Based on Neural Networks. Prog. Nucl. Energy 2020, 128, 103491. [Google Scholar] [CrossRef]
- Theisen, M.F.; Flores, K.N.; Balhorn, L.S.; Schweidtmann, A.M. Digitization of Chemical Process Flow Diagrams Using Deep Convolutional Neural Networks. Digit. Chem. Eng. 2023, 6, 100072. [Google Scholar] [CrossRef]
- Jamieson, L.; Moreno-Garcia, C.F.; Elyan, E. Towards Fully Automated Processing and Analysis of Construction Diagrams: AI-Powered Symbol Detection. Int. J. Doc. Anal. Recognit. (IJDAR) 2025, 28, 71–84. [Google Scholar] [CrossRef]
- Nguyen, M.T.; Pham, V.L.; Nguyen, C.C.; Nguyen, V.V. Object Detection and Text Recognition in Large-Scale Technical Drawings. In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods-ICPRAM, Virtual, 4–6 February 2021; SciTePress: Setúbal, Portugal, 2021. [Google Scholar] [CrossRef]
- Toral, L.; Moreno-García, C.F.; Elyan, E.; Memon, S. A Deep Learning Digitisation Framework to Mark up Corrosion Circuits in Piping and Instrumentation Diagrams. In Document Analysis and Recognition—ICDAR 2021 Workshops; Barney Smith, E.H.; Pal, U., Translators; Springer International Publishing: Cham, Switzerland, 2021; pp. 268–276. [Google Scholar]
- Byun, J.; Kang, B.; Mun, D.; Lee, G.; Kim, H. Optimizing Image Format Piping and Instrumentation Diagram Recognition: Integrating Symbol and Text Recognition with a Single Backbone Architecture. J. Comput. Des. Eng. 2025, 12, 55–72. [Google Scholar] [CrossRef]
- Xie, L.; Lu, Y.; Furuhata, T.; Yamakawa, S.; Zhang, W.; Regmi, A.; Kara, L.B.; Shimada, K. Graph Neural Network-Enabled Manufacturing Method Classification from Engineering Drawings. Comput. Ind. 2022, 142, 103697. [Google Scholar] [CrossRef]
- Han, S.-T.; Moon, Y.; Kim, J.-B.; Lee, H.; Mun, D. Graph Neural Network-Based Method for Classifying Continuous Lines in Piping and Instrumentation Diagram. Adv. Eng. Inform. 2025, 66, 103457. [Google Scholar] [CrossRef]
- Carrara, A.; Nousias, S.; Borrmann, A. Employing Graph Neural Networks for Construction Drawing Content Recognition. In Computing in Civil Engineering 2024; American Society of Civil Engineers: Reston, VA, USA, 2024; pp. 351–360. [Google Scholar] [CrossRef]
- VanLehn, K.; Lynch, C.; Schulze, K.; Shapiro, J.A.; Shelby, R.; Taylor, L.; Treacy, D.; Weinstein, A.; Wintersgill, M. The Andes Physics Tutoring System: Lessons Learned. Int. J. Artif. Intell. Educ. 2005, 15, 147–204. [Google Scholar] [CrossRef]
- Hutchinson, T.C.; Kuester, F.; Phair, M.E. Sketching Finite-Element Models within a Unified Two-Dimensional Framework. J. Comput. Civ. Eng. 2007, 21, 175–186. [Google Scholar] [CrossRef]
- Murugappan, S.; Piya, C.; Yang, M.C.; Ramani, K. FEAsy: A Sketch-Based Tool for Finite Element Analysis. J. Comput. Inf. Sci. Eng. 2017, 17, 031009. [Google Scholar] [CrossRef]
- Loong, C.N.; San Juan, J.D.Q.; Chang, C.-C. Image-based Structural Analysis for Education Purposes: A Proof-of-concept Study. Comput. Appl. Eng. Educ. 2023, 31, 1200–1218. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Bradski, G.; Kaehler, A. Learning OpenCV: Computer Vision with the OpenCV Library; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2008. [Google Scholar]
- Gonzalez, R.C. Digital Image Processing; Pearson Education: Chennai, India, 2009. [Google Scholar]
- Otsu, N. A Threshold Selection Method from Gray-Level Histograms. Automatica 1975, 11, 23–27. [Google Scholar] [CrossRef]
- Haralick, R.M.; Sternberg, S.R.; Zhuang, X. Image Analysis Using Mathematical Morphology. IEEE Trans. Pattern Anal. Mach. Intell. 1987, PAMI-9, 532–550. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of Yolo Architectures in Computer Vision: From Yolov1 to Yolov8 and Yolo-Nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]













| Stage | Dataset | Target Classes | Sample Size | Training Purpose |
|---|---|---|---|---|
| 1 | Node Region Dataset | Load nodes, Support nodes, Normal nodes | 670 | To locate the approximate range of nodes and prepare for ROI cropping. |
| 2 | Fine-grained Node Dataset | Nodes | 1570 | To precisely locate node coordinates within the ROI for establishing the stiffness matrix. |
| 3 | Support Classification Dataset | Fixed supports, Pinned supports, Sliding supports, Roller supports | 1710 | To determine the types of boundary conditions. |
| 4 | Dimension Symbol Dataset | X/Y-direction dimension limit endpoints | 670 | To extract actual geometric dimensions. |
| 5 | Load Dataset | Concentrated force (F), Bending moment (M), Distributed load (q) | 670 | To identify the types of load actions. |
| 6 | Load Vector Dataset | Force arrows, Moment arrows | 1280 | To determine the direction of load actions. |
| 7 | Value Localization Dataset | Load values, Dimension values | 460 | To locate regions for OCR recognition. |
| Model Name | Input Size | Epochs | Batch Size | Optimizer | Initial Learning Rate |
|---|---|---|---|---|---|
| Node_area | 1024 | 50 | 4 | SGD | 0.01 |
| Dimension | 1024 | 50 | 4 | SGD | 0.01 |
| Load | 1024 | 50 | 4 | SGD | 0.01 |
| Value | 1024 | 50 | 4 | SGD | 0.01 |
| Support_Cls | 320 | 50 | 16 | SGD | 0.01 |
| Node | 640 | 50 | 8 | SGD | 0.01 |
| Load Vector | 640 | 50 | 8 | SGD | 0.01 |
| Model | Precision | Recall | mAP@0.5 (Top-1 Accuracy) | Inference Time (ms) |
|---|---|---|---|---|
| Node_area | 0.997 | 0.993 | 0.995 | 28.83 |
| Dimension | 0.998 | 0.998 | 0.995 | 28.37 |
| Load | 0.997 | 0.997 | 0.995 | 28.55 |
| Value | 0.975 | 1.000 | 0.987 | 19.15 |
| Support_Cls | / | / | 0.972 (Acc) | 4.05 |
| Node | 0.979 | 0.984 | 0.985 | 10.76 |
| Load Vector | 1.000 | 1.000 | 0.995 | 13.84 |
| Support Type | ||||||
|---|---|---|---|---|---|---|
| Fixed Support | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 |
| Pinned Support | 0 | 0 | 1 | 0.0 | 0.0 | 0.0 |
| Sliding Support | 0 | 1 | 0 | 0.0 | 0.0 | 0.0 |
| Roller Support | 1 | 0 | 1 | 0.0 | 0.0 | 0.0 |
| Group | Data Example (Standardized Format) | Description |
|---|---|---|
| I. Global Parameters | 10, 10, 7, 3, 3 × 107 | Total Nodes, Elements, Loaded Nodes, Supports, E |
| II. Node Geometry | 1, 0.0, 0.0 | Node ID, X-coord, Y-coord |
| 2, 0.0, 8.0 | ||
| …… | ||
| III. Element Connection | 1, 1, 2, 2.4 × 10−1, 3.88 × 10−2 | Element ID, Start Node, End Node, Area, Inertia |
| 2, 2, 3, 2.4 × 10−1, 3.88 × 10−2 | ||
| …… | ||
| IV. Load Vector | 2, 10, −180, −360 | Node ID, FX, FY, M |
| 3, 5, −15, −7.5 | ||
| …… | ||
| V. Boundary Condition | 1, 0, 0, 0, 0.0, 0.0, 0.0 | Support Node ID, Constraints (u, v, θ), Prescribed Values |
| 6, 0, 0, 0, 0.0, 0.0, 0.0 | ||
| 9, 0, 0, 0, 0.0, 0.0, 0.0 |
| Stage | Time (Case 1) | Time (Case 2) | Remarks |
|---|---|---|---|
| Image Pre-processing | 5 s | 5 s | Includes manual ROI selection |
| YOLO Recognition | 3.02 s | 3.68 s | Model inference speed |
| Program Calculation | 15 s | 15 s | Includes material parameter input |
| Plotting Output | 0.5 s | 0.5 s | / |
| Total | 23.52 s | 24.18 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, D.; Fan, S. Research on an Automatic Solution Method for Plane Frames Based on Computer Vision. Sensors 2026, 26, 1299. https://doi.org/10.3390/s26041299
Wang D, Fan S. Research on an Automatic Solution Method for Plane Frames Based on Computer Vision. Sensors. 2026; 26(4):1299. https://doi.org/10.3390/s26041299
Chicago/Turabian StyleWang, Dejiang, and Shuzhe Fan. 2026. "Research on an Automatic Solution Method for Plane Frames Based on Computer Vision" Sensors 26, no. 4: 1299. https://doi.org/10.3390/s26041299
APA StyleWang, D., & Fan, S. (2026). Research on an Automatic Solution Method for Plane Frames Based on Computer Vision. Sensors, 26(4), 1299. https://doi.org/10.3390/s26041299

