Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products

Shi, Xiaolin; Wu, Xu; Zhang, Han; Xu, Xiaolong

doi:10.3390/su17093958

Open AccessArticle

Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products

College of Mechanical Engineering and Automation, Liaoning University of Technology, Jinzhou 121001, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(9), 3958; https://doi.org/10.3390/su17093958

Submission received: 8 March 2025 / Revised: 17 April 2025 / Accepted: 24 April 2025 / Published: 28 April 2025

Download

Browse Figures

Versions Notes

Abstract

Aiming at the efficiency bottleneck and error risk caused by the over-reliance on manual experience in traditional assembly sequence planning, the urgent demand for deep reuse of multi-source knowledge in complex products, and the growing demand for resource saving and sustainable development, this study focuses on the core problem of the lack of empirical knowledge modeling and reasoning mechanism in the assembly process of complex products, and proposes a three-phase assembly sequence intelligent planning method that integrates deep learning and ontology theory. Method: First, we propose an instance segmentation model based on the improved Mask R-CNN architecture, incorporate the ResNet50 pre-training strategy to enhance the generalization ability of the model, reconstruct the Mask branch, and add the attention mechanism to achieve high-precision recognition and extraction of geometric features of the assembly parts. Secondly, a multi-level assembly ontology semantic model is constructed based on the ontology theory, which realizes the structured expression of knowledge from three dimensions: product structure level (product–assembly–part), physical attributes (weight/precision/dimension), and assembly process (number of fits/direction of assembly), and builds a reasoning system with six assembly rules in combination with the SWRL language, which covers the core elements of geometric constraints, process priority, and so on. Finally, experiments are carried out with the example gearbox as the validation object, and the results show that the assembly sequence generated by the method meets the requirements of the process specification, which verifies the validity of the technology path. By constructing a closed-loop technology path of “visual perception–knowledge reasoning–sequence generation”, this study effectively overcomes the subjective bias of manual planning, integrates multi-source knowledge to improve the reuse rate of knowledge, and provides a solution of both theoretical value and engineering feasibility for the intelligent assembly of complex electromechanical products, which reduces the R&D cost and contributes to the sustainable development.

Keywords:

mask R-CNN; UNet3+; instance segmentation; ontology; assembly sequence; manufacturing; circular design

1. Introduction

In the field of complex equipment manufacturing, assembly process planning is a core production link, and its rationality is closely related to cost control, production efficiency and process reliability. Taking large machinery such as automobiles, ships, and shield machines as examples, the cost of assembly links accounts for more than 40% of the total manufacturing cost, and the labor time consumption reaches 20–50% of the total production time [1], highlighting the key role of optimized assembly planning in improving the competitiveness of the industry. However, traditional assembly planning methods are facing significant challenges in the context of industrial transformation and upgrading. The planning method based on empirical rules is easy to implement but inefficient [2]. Knowledge engineering methods are limited by the bottleneck of expert experience acquisition and knowledge transfer. However, the computational complexity of mathematical modeling methods (such as cutset analysis and polychromatic set modeling) increases exponentially with the increase of the number of parts, leading to the “combinatorial explosion” problem [3].

The key difficulty of assembly process planning is the accurate acquisition of geometric constraint information. Limited by the complexity of CAD model and measurement technology, traditional methods are difficult to automatically analyze the deep assembly relationships between parts, resulting in a heavy reliance on manual experience in the planning process. In recent years, advances in image analysis techniques have provided new ideas to break through this bottleneck. For example, case segmentation [4] based on pixel-level classification can recognize geometric features of parts and assembly relationships, and simulate the cognitive analysis process of process experts. The structured knowledge base provides theoretical support for the construction of systematic assembly knowledge system by explicitly expressing assembly knowledge, formalizing logic rules and precipitation reuse experience. The combination of the two is expected to solve the inherent shortcomings of traditional methods in environment perception, knowledge reasoning, and decision support.

1.1. Intelligent Assembly Sequence Planning

Milad et al. [5] applied genetic algorithm to the assembly planning problem and proposed that the fitness function will be utilized to construct the feasible solution space, and then the genetic algorithm will be used to search in the feasible solution space. For the task sequence planning, which affects the efficiency and stability of complex assembly systems, Zhang et al. [6] proposed an adaptive quantum genetic algorithm based on the artificial potential field and the gradient of the objective function. Lu Cong et al. [7] balanced the assembly sequences and the assembly lines, and after calculating the sum of the assembly task time, the change of assembly direction time, and the change of tool time, they proposed the optimal assembly sequence that should be used when a given generating cycle is used. The optimal assembly sequence is proposed for a given generation cycle. Tsai et al. [8] proposed to introduce a unifying factor in particle swarm optimization algorithm to balance the influence of cognitive conditions, which is significantly more efficient than the particle swarm algorithm and has better solution effectiveness and consistency.

The above research still faces some difficulties: (1) The number of assembly parts is increasing, and the number of questions and answers will rise sharply, which is difficult for manual operation. (2) Evolutionary algorithms may need to consume a lot of computational resources and time when dealing with large-scale problems, with slow convergence, difficult parameter selection, susceptibility to premature problems, and dependence on problem characteristics.

1.2. Deep Learning Techniques for Assembly

At the same time, the application of deep learning technology [9] in the field of assembly is gradually deepening. A Fully Convolutional Network (FCN) [10] realizes semantic segmentation through end-to-end training, and Xu et al. [11] further introduces context information fusion mechanism to improve segmentation accuracy. Chen et al. [12] proposed DeepLab series algorithms, which combined dilated convolution and conditional random fields to optimize multi-scale feature extraction. The U-Net network designed by Ronneberger et al. [13] uses a lightweight encoder–decoder structure to realize efficient segmentation under small sample data. The Mask R-CNN framework proposed by Kaiming He et al. [14] improves the instance segmentation accuracy to a new level by extending the Faster R-CNN architecture.

Zhao et al. [15] proposed hierarchical equilibrium loss (FHEL/CFHEL) to address the multi-level imbalance problem in long-tail instance segmentation of images, achieving overall performance gains on benchmark datasets. Zhao et al. [16] studied the global-local feature refusion network (GLFRNet) to tackle the challenges posed by the complexity and diversity of targets in remote sensing image instance segmentation, achieving a maximum mask AP improvement of 1.9 through a multi-modal feature fusion mechanism.

However, the complex structure of mechanical assemblies, the single texture of parts, and the lack of public datasets seriously restrict the practical application of existing results in industrial scenarios.

1.3. Ontology in the Field of Assembly

In terms of knowledge representation and reasoning, ontology technology demonstrates unique advantages. Sudarsan et al. [17] constructed an ontology-based product lifecycle management framework to achieve cross-domain knowledge integration; Bao et al. [18] proposed a twin workshop modeling method, which dynamically associates assembly resources and process procedures through ontology instantiation; Das et al. [19] developed an ontology-based decision support system that integrates CAD model information and rule reasoning; Sanya [20] proposed a knowledge-supported product design system for the aerospace industry, using an ontology-based approach for semantic knowledge management; Gao et al. [21] introduced the theory of fuzzy interval description logic to enhance the expression and reusability of process design knowledge. Chen et al. [22] proposed an ontology and CBR-based automated decision-making method for disassembling mechanical products. Zhong et al. [23] used the ontology approach for tolerance synthesis, main model similarity calculation, etc. These studies indicate that ontology models can effectively address core issues such as fragmented assembly knowledge, formalization of logical rules, and difficulties in experience transfer.

Considering the above technical bottlenecks and research progress, this study proposes an assembly sequence generation framework combining instance segmentation and ontology knowledge modeling. Its innovation is reflected in three aspects: (1) The improved Mask R-CNN algorithm is used to accurately extract the geometric features of assembly parts, breaking through the dependence of traditional information acquisition methods on artificial experience; (2) A multi-level assembly information ontology model is constructed to systematically express assembly process knowledge and support cross-scenario transfer; (3) The assembly semantic reasoning mechanism is constructed based on SWRL rules, and the feasible assembly sequences are generated by combining prior knowledge and ontology logic. Through the closed-loop architecture of “visual perception, knowledge modeling, and rule-based reasoning”, the framework provides assembly planning solutions with both efficiency and reliability for intelligent manufacturing of complex products.

The remainder of the paper is structured as follows. In Section 2, an overview of our approach is provided. In Section 3, Mask R-CNN is introduced into the assembly field and improved. In Section 4, ontology modeling and inference rule formulation of assembly products are carried out. In Section 5, an example is given to prove the rationality of the whole method proposed. Finally, we conclude the paper and present the future work.

2. Research Methods

Figure 1 gives the overall methodological framework of this paper, which is guided by the instance segmentation of the assembly model and realizes the assembly sequence planning of the assembly model from the semantic perspective of the assembly process knowledge, which mainly includes 3 steps:

Step 1. Image data processing. Collect the part and product images through different product 3D models, use Laplace sharpening method to process the image edges for image enhancement, and perform noise reduction on the edge-enhanced pictures to eliminate the noise effect.

Step 2. Improve the Mask R-CNN network model. Based on the Mask R-CNN network structure, the feature extraction network structure is replaced with the UNet3+ network that extracts features more carefully, and the attention mechanism is added to improve the performance of the improved neural network model.

Step 3. Ontology module. Carry out assembly information modeling, instantiate the part information obtained from semantic segmentation, regularize the expression of empirical knowledge, and reason out the feasible assembly sequence.

3. An Improved Mask R-CNN Algorithm for Assembly Sequence Planning

3.1. Data Processing in the Field of Assembly Sequence Planning

In this paper, an improved instance segmentation algorithm based on Mask R-CNN is proposed for fine segmentation of parts to extract part information in the field of assembly sequence planning. Due to the specificity of the industrial field, there are fewer relevant datasets available for reference use, so the construction of an exclusive dataset is carried out. The generalization ability of the improved instance segmentation model for complex assembly situations is improved by increasing the image diversity through different gestures such as changing the angle of the 3D model of the part library, changing the texture, and resizing the size of the image, as shown in Figure 2. It is ensured that the part/assembly model images come from a variety of sources and have high clarity, involving different angles of the part, lighting conditions, and background environments. Since mechanical products are usually dominated by silver-gray color, the image pixel values of the part models are linearly converted from the original range (usually 0–255) to a standardized range (e.g., 0–1) to eliminate the brightness differences between different images and improve the model training efficiency and accuracy.

In order to improve the richness of the assembly part image dataset and the adaptive generalization ability of the improved instance segmentation algorithm to the actual assembly scene, different data enhancement methods are used to increase the sample size of the dataset. Firstly, the image is processed by Laplace sharpening, and the second-order derivative operation is performed on the part image using the Laplace operator to detect the details and textures in the part image, and to enhance the clarity and contrast of the edge details of the part image (as shown in Figure 3b).

The Laplace operator is a second order differential operator that represents the second order derivative operation on the pixel values in the image in the x and y directions as shown in Equation (1):

\nabla^{2} f = \frac{\partial^{2} f}{\partial x^{2}} + \frac{\partial^{2} f}{\partial y^{2}}

(1)

The image

g (x, y)

after edge detection is shown in Equation (2):

g (x, y) = f (x, y) - K \nabla^{2} f (x, y)

(2)

In the process of instance segmentation of assembly image, the presence of noise may cause the performance of instance segmentation algorithm to degrade. Some isolated noise is removed by median filtering while retaining most of the edge information of the part image. Figure 3c shows the image with pretzel noise added. The noise is removed by median filtering while effectively retaining the contour information enhanced by the Laplace operator; Figure 3d shows the image after processing using median filtering. For 2D images, median filtering is shown in Equation (3):

g (x, y) = m e d i a n f (x - k, y - l), (k, l \in W)

(3)

where

f (x, y)

is the original image,

g (x, y)

is the processed image, and W is a two-dimensional template, which is usually a 3 × 3, 5 × 5, etc. region.

3.2. Mask R-CNN Improvement in the Field of Assembly Sequence Planning

The basic structure of Mask R-CNN includes backbone, FPN (feature pyramid network), RPN (region proposal network), ROI Align, and prediction header (including bounding box regression, category classification, and mask generation). Through mask (mask) branching, Mask R-CNN is able to extract information such as the specific shape and location of each part target in the assembly model.

In order to improve the performance demonstrated by the Mask R-CNN segmentation algorithm for part instances in assembly models in terms of feature extraction capability, segmentation accuracy, and computational efficacy, we have improved the Mask branch as shown in Figure 4. Specifically, we adopted the UNet3+ network structure to improve the full convolutional part of the Mask branch. First, the feature map output from ROI Align is up-sampled to a size of 14 × 14 by bilinear interpolation, and then input to the UNet3+ network for up- and down-sampling operations to extract more feature details. Finally, after processing by the UNet3+ network, the resulting mask feature figure is of size 14 × 14 with 256 channels, which represent different feature information. Subsequently, these feature maps are subjected to a softmax classification operation to produce the final 2D mask image, where each pixel point represents the probability that the location belongs to a certain part class.

With its powerful ability, feature extraction network can accurately extract rich feature information such as edges, textures, shapes, etc. from assembly CAD images. In order to further improve the accuracy and efficiency of feature extraction, we introduce the feature pyramid network (FPN), which realizes the efficient extraction of features from assembly images by constructing a multi-scale feature map. When constructing the network model, we chose the structure of Resnet50+FPN as the backbone network, and this combination not only has powerful feature extraction capability, but also can ensure the stability and robustness of the model. During the training process, we used a pre-trained model and only fine-tuned the last five layers, while the weight parameters of the other layers remained unchanged, such a training strategy not only improves the training efficiency of the model, but also ensures the performance of the model. Eventually, the network successfully outputs feature maps with a size of 28 × 28 × 256, which provide strong support for subsequent part instance segmentation tasks.

The use of an attention mechanism can help the model to better focus on localized regions of the input, thus improving the performance of feature extraction. The CBAM (Convolutional Block Attention Module) attention mechanism is introduced to combine channel attention and spatial attention to capture the correlation between features in different dimensions, highlighting the most important parts of the feature map [24], and further improving the model’s ability to learn the target features. As shown in Figure 5, the CBAM attention module is inserted after each BottleNeck layer in Resnet50.

The backbone network extracts the assembly and part image feature information and transfers it to the region proposal network (RPN), which defines three kinds of anchor points (128, 256, 512) and three kinds of horizontal and vertical ratios (0.5, 1.0, 2.0), generates nine kinds of anchor points to determine the position and shape of the generated candidate frames, and then extracts candidate frames through the layer of the RPN network to obtain accurate positional information of the segmentation target. ROI Align traverses each candidate region, keeping the floating-point boundary without quantization, which greatly improves the accuracy of segmentation. Finally, a feature map of size 14 × 14 is output.

In order to improve the segmentation quality of the assembly parts, feature extraction refinement as well as enhanced deep and shallow feature fusion are considered. A modified UNet3+ [25] is used instead of FCN as the head of the mask branch of Mask R-CNN. Full-scale Skip Connections are introduced, which combine low-level semantics and high-level semantics from full-scale feature maps with fewer parameters.

The network architecture of UNet3+ is shown in Figure 6, which clearly demonstrates its subtle design consisting of two core parts: the encoder (left side) and the decoder (right side). In the encoder, each layer

X_{E n}^{i} (i = 1, 2, \dots, 5)

is responsible for capturing and refining the feature information of the image at different scales, and these feature maps not only contain rich low-level details, but also incorporate high-level semantic understanding. Each layer of the encoder’s feature map,

X_{E n}^{i}

is connected to the corresponding layer of the decoder,

X_{D e}^{i}

through a full-scale inter skip connection indicated by dashed lines. This design allows the network to utilize the high-resolution features retained in the encoder during the decoding process, thus improving the efficiency and accuracy of feature fusion. For example, the encoder-generated feature maps

X_{E n}^{1}

,

X_{E n}^{2}

,

X_{E n}^{3}

,

X_{E n}^{4}

, and

X_{E n}^{5}

generate the decoder 4’s feature map

X_{D e}^{4}

using a dashed line for the hopping connection, which is computed as Equation (4).

X_{D e}^{i} = \{\begin{matrix} X_{E n}^{i}, ∣ ∣ ∣ i = N \\ H ([\underset{S c a l e s \cdot 1^{t h} \sim i^{t h}}{\underset{⏟}{C (D (X_{E n}^{k}))_{k = 1}^{i - 1}, C (X_{E n}^{i})}}, \underset{S c a l e s \cdot (i + 1)^{t h} \sim N^{t h}}{\underset{⏟}{C (U (X_{D e}^{k}))_{k = i + 1}^{N}}}]), i = 1, \dots, N - 1 \end{matrix}

(4)

where

X_{E n}^{i}

represents the different feature mapping,

i

represents the

i

th downsampling layer in the coding direction,

N

represents the number of encoders,

C

(.) represents the convolution operation, function

H

(.) represents the feature aggregation mechanism, function

U

(.) represents the up-sampling operation, function

D

(.) represents the down-sampling operation, and [.] represents the splicing and fusion of channel dimensions.

In a feature pyramid network (FPN), the loss function is usually defined as the sum of the classification loss and the bounding box regression loss, as shown in Equation (5). The loss function of Mask R-CNN is defined as the sum of classification, regression, and segmentation losses (as shown in Equations (6)–(9)),

L_{c l s}

and

L_{b b o x}

represent the classification loss and bounding-box regression loss, respectively, whereas

L_{m a s k}

corresponds to the segmentation loss of the mask segmentation module. Specifically,

L_{c l s}

is the loss value of the binary classification task, which is calculated as shown in Equation (7), where

p_{i}

and

p_{i}^{*}

are the predicted probability of anchor point

i

and the truth label (0 or 1), respectively. Multi-class cross-entropy loss (MCEL) is used. This loss function aims to quantify the inconsistency between the probability distribution predicted by the model and the true probability distribution, thus measuring the performance of the classification task.

L = L_{c l s} + L_{b b o x}

(5)

L = L_{c l s} + L_{b b o x} + L_{m a s k}

(6)

L_{c l s} = \sum_{i} - l o g [p_{i}^{*} p_{i} + (1 - p_{i}^{*}) (1 - p_{i})]

(7)

In the Mask R-CNN architecture, the loss function serves as a measure of the performance of each branch of the network model. The loss function of the categorization branch aims to quantify the accuracy of the model’s prediction of the target category, and it is usually computed using a multi-category cross-entropy loss function.

Equation (8) is the loss value

L_{b b o x}

for the boundary regression branch, and a smoothed L1 loss function is used. This function introduces a smoothing term on top of the L1 loss, which enhances the stability of model training. In this Equation,

N_{r e g}

denotes the number of anchor points participating in the regression,

t_{i}

and

t_{i}^{*}

denotes the coordinate transformation parameters of the predicted and true values, respectively, and

R

(⋅) denotes the specific form of the smoothed

L 1

function.

L_{b b o x} = \frac{1}{N_{r e g}} \sum_{i} p_{i}^{*} R (t_{i} - t_{i}^{*})

(8)

Equation (9) is the loss value

L_{m a s k}

for the mask branch, using an average binary cross-entropy loss function. This function is used to evaluate the model’s binary prediction (foreground vs. background) for each pixel point, where

m \times m

represents the spatial dimension of the mask output, and

y_{i j}

and

y_{i j}^{*}

denote the prediction result and the true label at the

(i, j)

position, respectively. By minimizing these loss functions, Mask R-CNN can continuously improve its performance in target detection and instance segmentation tasks.

L_{m a s k} = - \frac{1}{m^{2}} \sum_{1 \leq i, j \leq m} [y_{i j}^{*} \log y_{i j} + (1 - y_{i j}^{*}) \log (1 - y_{i j})]

(9)

4. Construction of Assembly Knowledge Ontology and Semantic Reasoning Rules

Assembly knowledge representation is the premise and foundation of assembly sequence planning, while assembly features are the carriers of assembly-related information of parts and components. Among the description methods of domain knowledge, ontology model has many advantages in the expression of concept hierarchy and semantics, knowledge sharing and reuse, and knowledge reasoning. Therefore, to meet the needs of assembly information modeling, basic classes, attributes, declarations, etc. are defined to describe assembly knowledge and realize the reasoning of assembly knowledge.

4.1. Assembly Information Ontology Construction

In the field of assembly sequence planning, the rich engineering semantic knowledge embedded in the product forms the basis and prerequisite for the process [26]. This semantic knowledge, which has an impact on assembly sequence planning, consists mainly of the following aspects: the component hierarchy of the product, the characteristic attributes of the parts, and the physical properties of the parts. In view of this, the product semantic knowledge system consists of three layers of spatial objects, i.e., the product layer, the component layer, and the part layer. In addition, the system covers the attributes of assembly parts, the relationships between assembly parts, the types of assembly constraints, and the two types of core assembly relationships based on hierarchy and constraints. For the needs of assembly sequence planning, the constructed assembly semantic model contains the following key definitions:

Definition 1.

A formal description of a product.

First of all, we define Product as a set, denoted as

p r o d u c t = \{C_{1}, C_{2}, \dots, C_{n}, P_{1}, P_{2}, {\dots, P}_{m}\}

, where

C_{1}, C_{2}, \dots, C_{n}

denote the

n

components that make up the product, and

P_{1}, P_{2}, {\dots, P}_{m}

denote the

m

parts that make up the product. As a concept name, “Product” represents a specific product entity. In addition, we introduce productName as an attribute to identify the name of the product. In order to explicitly describe the assembly hierarchy between products and components, we define the following predicates: hasAssembly, hasAssemblyOf, hasPart, and isPartOf. Together, these predicates express the fact that a product is a complex system composed of multiple assemblies and parts by means of a specific assembly relationship.

Definition 2.

Formal description of Component.

Similarly, we define Components as a set, denoted as

C o m p o n e n t = \{C_{1}, C_{2}, \dots, C_{e}, P_{1}, P_{2}, {\dots, P}_{f}\}

, where

C_{1}, C_{2}, \dots, C_{e}

denotes

e

subcomponents that constitute a component, and

P_{1}, P_{2}, {\dots, P}_{f}

denotes

f

parts that constitute a component. Here, “Component” is used as a conceptual name to refer to a component of the product. componentName attribute is used to identify the name of the component. In order to describe the assembly hierarchy between components and parts, we also use the predicates hasAssembly, hasAssemblyOf, hasPart, and isPartOf, which together reveal that an assembly is a complex structure composed of multiple subassemblies and parts.

Definition 3.

Formal description of a Part and its attributes.

We further define a Part as a collection of parts with physical and geometric properties, denoted as

{P a r t}_{k} = \{P_{1}, P_{2}, \dots, P_{k}\}

, where

P_{1}, P_{2}, \dots, P_{k}

denote the

k

parts that make up the component or product. The “Part” is used as a conceptual name to represent a basic building block of the component or product. partName attribute is used to identify the name of the part.

In order to distinguish the different attributes of a part, we introduce the following two concepts: hasPartNonGeometricAttributes and hasPartGeometricAttributes. hasPartNonGeometricAttributes represent the non-geometric attributes of a part such as quality, accuracy, dimensions, and other important information that affects component identification, while hasPartGeometricAttributes are used as predicate names to indicate that the part has geometric structural attributes. The definition of these attributes helps to more accurately describe and analyze the role and function of parts in products and components.

With the development of ontology concepts, the most widely used for constructing ontologies is the seven-step approach proposed by Stanford University:

1.: Determine the application domain of the ontology for the assembly sequence planning domain.
2.: Consider reuse of existing ontologies, which need to be re-modeled as proprietary ontologies
3.: List important terms in the ontology, collect conceptual definitions related to assembly information as well as additional ontology knowledge definitions.
4.: Define the hierarchical relationship between classes as shown in Figure 7. Define the terms of monadic relationships as classes according to the representation model and define the hierarchical relationships between classes. The meaning of the hierarchical relationship between all classes in the assembly information representation ontology is as follows: Component is used as a superclass to define the common attributes of its subclasses, and its subclasses are Box_component for box component, Drive_disk_component for transmission component, Fastening_component for fastening component, and Sealing_component represents the sealing component.
5.: Define attributes, terms that represent binary relationships can be defined as attributes, as shown in Figure 8, the specific value domains of the defined attributes in the assembly defined domains are shown in Table 1 and Table 2. Their meanings are as follows:

Object attributes: requiresAssemblyBefore indicates that one component requires assembly before another. For example, Key1 requiresAssemblyBefore Gear1 (assuming that the key needs to be assembled before the gear). connectedBy indicates what the two components are connected by. For example, BoxBody1 connectedBy Box_cover1. connectedTo indicates the connection relationship between two parts, especially when connected in a bolt group. For example, Bolt_nut_group1 connectsTo Box1 and Box_Cover1. isMountedOn describes the relationship in which one part is mounted on another. For example, Key1 isMountedOn Axle1 (assuming the key is mounted on an axis.) requiresAssemblyAfter indicates that one component requires assembly after another. For example, Gear1 requiresAssemblyAfter Axle1 (assuming the gear needs to be assembled on the axle).
Data Attributes: hasWeight indicates the mass of the part. hasPosition indicates the relative positional relationship of the part. hasPrecision indicates the precision of the part. hasSize indicates the size of the part. numAssemblyRelationships indicates the number of assembly relationships for a part.

6.: Define the limits of attributes, according to the needs of the assembly sequence planning, the definition of the attribute domain and value domain to limit.
7.: Create instances, according to the actual application requirements, for a given product assembly sequence planning to create instances. For example, part combinations need to be instantiated according to the given assembly sequence planning for the combination of each other.

4.2. OWL and SWRL Based Assembly Information Model with Inference Rules

Using OWL to describe the assembly information model with strong representation and logic, combined with Semantic Web Rule Language (SWRL) to express the assembly rules and realize the reasoning between assembly information. The assembly sequence planning information is obtained through the reasoning of part-specific dependencies and attribute relationships. For example, keys need to be assembled before gears.

In the complex mechanical assembly process, it is a crucial task to artificially set the weight, accuracy, size, and relative positional relationship between basic parts, which is directly related to the performance, reliability, and productivity of the final product.

First of all, as far as the weight factor is concerned, larger parts tend to have greater inertia, so in the assembly process, following the strategy of “the heavier one is prioritized for assembly” can effectively reduce the difficulty and complexity of the subsequent assembly steps. This will help to avoid the assembly accuracy loss or assembly failure caused by excessive weight at the later stage of assembly.

Secondly, accuracy is one of the important indicators of the manufacturing quality of parts. In the assembly process, the principle of “the higher the precision, the later the assembly” can ensure that high-precision parts are installed in a more stable assembly environment, thus minimizing assembly errors. This principle reflects the fine control of the assembly sequence, which is the key to ensure the overall accuracy and performance of the product.

In addition, the size factor also plays a significant role in the assembly sequence. According to the logic of “the larger size first assembly”, assembly can ensure that large parts in the assembly process have enough space for positioning and fixing. This helps to avoid assembly difficulties or assembly quality problems caused by space constraints in the later stages of assembly.

Finally, from the point of view of structural hierarchy, regarding the relative position relationship between parts, the assembly sequence should follow the principle of “first internal and then external”. This means that in the assembly process, the installation of internal parts should be prioritized before the installation of external parts. This principle helps to ensure the correct positioning and fixation of internal parts, and provides stable support and foundation for the subsequent installation of external parts.

Match the number of relationship rules: CAD software (2024 version of SolidWorks) can be used to analyze the three-dimensional model obtained from the corresponding parts of the fit relationship; parts with more assembly relationships should be more prioritized for installation.

5. Case Study

In order to verify the feasibility of the proposed method, an experimental validation and analysis is carried out using the speed reducer as an example. This experiment uses Python programming language as the experimental basis, and the configuration of the experimental platform is based on Windows 11 operating system, I9-13900KF CPU (3.00 GHz), GTX4080 GPU, CUDA 11.1, Python 3.8.13, and PyTorch 1.12.1. During the training process, the experimental environments of all the algorithms are identical.

5.1. Task Configuration and Flow of Target Detection and Segmentation

In the task of target detection and segmentation, the definition of positive samples is crucial. In this paper, candidate regions with IOU (Intersection over Union) greater than 0.5 are used as positive samples, and the mask loss is calculated only on the positive samples.

For the configuration of an RPN (region proposal network), the ratio of positive and negative samples is set to 1:3, and five different scales (specifically 8, 16, 32, 64, and 128 pixels) and three different aspect ratios (specifically 1:1, 1:2, and 2:1) are used to generate candidate regions. These settings aim to improve the adaptability of RPN to the target size and shape.

For feature extraction, ResNet50 is employed as the backbone network, and its generated feature maps are used by RPN to generate 300 candidate regions (ROIs) for classification and bounding box regression. In order to further enhance the multi-scale nature of feature representation, an FPN (feature pyramid network) is introduced, which provides the RPN with richer feature information by fusing feature maps from different layers, and the RPN generates candidate regions based on the multi-scale feature maps output from the FPN, then performs a non-maximization suppression (NMS) operation on the candidate regions in order to reduce the number of overlapping frames.

In the detection and segmentation stage, the candidate regions with the top 100 detection scores are selected for mask detection. The mask branch is responsible for predicting the masks of the seven categories, and according to the classification results, the masks of the corresponding categories are selected and resized to the size of the ROI. Finally, the mask is binarized using a threshold of 0.5 to obtain the final mask image. This process effectively realizes the accurate detection and segmentation of the target.

5.2. Performance Analysis of the Improved Model

The number of original images in the dataset is 1910, containing 344 diagrams of the complete assembly model of the gearbox and involving seven parts. After data enhancement, all the processed images were divided into training set, validation set, and test set according to the ratio of 7:2:1, where the training set is the input image for model training, and the validation set is also crucial in the training process, used to evaluate and calibrate the performance of the model in a staged manner. In order to provide accurate supervised information for subsequent part refinement segmentation of the assembly model, the assembly part images are accurately labeled, as shown in Figure 9. In the experiments, the (Bbox)mAP50, (Seg)mAP50and mAR metrics of the algorithmic model are shown in Table 3.

The Bbox is used to identify and localize the target object in the image, which represents the main metrics for target location detection. Seg is used to segment the target object from the background, which represents the segmentation metrics for the target mask. The mAR (mean Average Recall) is used to evaluate the performance of object detection models and reflects the overall recall capability of the model under different categories and different IoU requirements. The same dataset and parameters are used to control the variables during the training process. FCN Fully (Convolutional Network) is used for the Maskhead of the original Mask R-CNN when comparing the original Mask R-CNN using Mask-U3 and Mask-U3-CBAM.

Compared to the Mask R-CNN algorithm, when the Mask-U3 algorithm replaces the Maskhead with UNet3+, its performance demonstrates a significant improvement in key metrics. Specifically, the (Bbox) mAP50 improves from 82.0% to 83.5%, an increase of 1.5%; meanwhile, the (Seg) mAP50 also improves from 81.8% to 83.1%, an increase of 1.3%. In addition, the mean recall rate likewise showed progress, increasing from 93.7% to 95.4%, an increase of 1.7%.

Further, when the CBAM attention module is embedded in the backbone of the Mask-U3 algorithm to form the Mask-U3-CBAM algorithm, its performance is again optimized. The (Bbox)mAP50 and (Seg)mAP50 are improved to 83.9% and 84.0%, respectively, compared to the Mask-U3 algorithm. The average recall also increases slightly, from 95.4% to 95.8%, an increase of 0.4%.

Ultimately, the Mask-U3-CBAM algorithm, which combines the Maskhead replacement and the CBAM attention mechanism module in the backbone, achieves a more significant leap in performance compared to the original Mask R-CNN algorithm. The (Bbox) mAP50 improves dramatically from 82.0% to 83.9%, an increase of 1.9%, while the (Seg) mAP50 jumps from 81.8% to 84.0%, an increase of 2.2%. It is also noteworthy that the average recall rate also realized a significant increase of 2.1 percentage points from 93.7% to 95.8%.

The above results not only validate the effectiveness of Maskhead replacement with UNet3+, but also further reveal the potential of adding the CBAM attention mechanism module in improving the performance of target detection and instance segmentation.

We designed a set of experiments in the model training phase to explore the convergence of the mAP50 values and loss functions of the different network structure models involved in the ablation experiments, and to compare the mAP50 curves and loss function curves of different algorithms. The changes in the mAP50 values and overall loss functions of the different network structure models during the training process are shown in Figure 10.

Based on the trends of the mAP50 curves in Figure 10, the mAP50 value curves of the Mask-U3 and Mask-U3-CBAM algorithms have a leading edge compared to the original Mask R-CNN at the initial position, and gradually tend to stabilize around the 15th epoch, with the highest values of 0.835 and 0.839, respectively, which improve by 0.015 (i.e., 1.5%) and 0.019 (i.e., 1.9%) compared to the original Mask R-CNN of 0.820, respectively. The highest values are 0.835 and 0.839, respectively, compared with the original Mask R-CNN’s 0.820, which is 0.015 (i.e., 1.5%) and 0.019 (i.e., 1.9%), respectively. Upon observation, it can be seen that the curve with the addition of the attention mechanism has a faster upward trend and is able to reach a higher peak value.

The overall loss functions of the Mask-U3 and Mask-U3-CBAM algorithms have faster convergence and lower values after convergence than the original algorithms. They converge to 0.1232 and 0.1208, respectively, a decrease of 0.017 (i.e., 1.70%) and 0.0194 (i.e., 1.94%) compared to the original Mask R-CNN loss function of 0.1402.

Based on the observation of the loss function graphs before and after embedding the CBAM attention module, the loss function decreases more and is fitted faster to reach the fitted state after embedding the attention module.

The predicted results of the three network models for the reducer model image are shown in Figure 11, which demonstrates the effectiveness of the Mask R-CNN as the baseline model for performing instance segmentation on the reducer image. The Mask R-CNN model shows a high degree of accuracy in terms of part detection, successfully identifying the majority of the parts in the image, with only the locations where the keys have been set to be present where the key is not detected. It was observed at the segmentation mask level that the boundaries of a few parts were not well defined, especially in the End_ plate region, where a certain degree of incomplete mask coverage was observed.

Subsequently, we introduced the Mask-U3 model optimized for Mask R-CNN using the UNet3+ network. Compared with the segmentation results of the baseline model, the accuracy of Mask-U3 on part detection is significantly improved, and the undetected keys in the above are successfully recognized. From the Figure 11, it can be clearly seen that the quality of the segmentation results has been substantially improved, especially in the End_ plate region, the problem of incomplete coverage of the mask has been effectively solved, and the mask can accurately and completely cover the parts to be segmented, which further validates the effectiveness of the improved model.

On this basis, we further propose the Mask-U3-CBAM model, which is based on Mask-U3 and realized by integrating the CBAM attention mechanism in the backbone network (Backbone). The predicted result Figure 11 visualizes the significant improvement in the segmentation effect of this model. The Mask-U3-CBAM model not only achieves accurate segmentation of all parts, but also reaches a very high level in the precise positioning of the part detection frame and the complete coverage of the segmentation mask.

As shown in Figure 11, the right part demonstrates the baseline model and the improved model, and the model’s evaluation metrics for each part, i.e., precision, recall, and F1 score, when segmenting the instances of different parts of the speed reducer.

Figure 11A demonstrates the application of the precision metric, which serves as a key metric for assessing the predictive performance of the model, specifically reflecting the proportion of actual positive samples out of those predicted to be positive by the model. In the Figure 11A, the bar chart visualizes the accuracy data comparison between the benchmark model Mask R-CNN and the improved models Mask-U3 and Mask-U3-CBAM when segmenting for different parts. Meanwhile, the line graph corresponding to the bar chart clearly reveals the improvement trend of the improved model in terms of accuracy. Through careful observation, we can clearly conclude that compared with the benchmark model Mask R-CNN, the Mask-U3 and Mask-U3-CBAM models exhibit higher precision in the instance segmentation task, which indicates that the proposed improvement strategy is effective.

Figure 11B focuses on the recall metric, which is used to quantify the proportion of all actual positive samples that are correctly predicted as positive by the model. In this Figure 11B, the bar chart compares the recall data of the Mask R-CNN, Mask-U3, and Mask-U3-CBAM models for different part segmentation tasks. The accompanying line graph vividly demonstrates the magnitude of the improvement in recall of the improved models. Through in-depth analysis, we can definitely say that the Mask-U3 and Mask-U3-CBAM models have a significant advantage over the Mask R-CNN model in terms of instance segmentation recall, which further validates the effectiveness of the improved strategy.

Figure 11C, on the other hand, demonstrates the evaluation results of the F1 score, which serves as a reconciled average of precision and recall, achieving a good balance between the two. In the Figure 11C, the bar charts visually compare the F1 scores of Mask R-CNN, Mask-U3, and Mask-U3-CBAM models in different part segmentation tasks. Meanwhile, the line graph corresponding to the bar chart clearly reveals the improvement of the improved models in terms of F1 scores. After careful analysis, we can conclude that compared with the benchmark model Mask R-CNN, the Mask-U3 and Mask-U3-CBAM models show superior performance in terms of instance segmentation F1 scores, which once again proves the effectiveness and practicality of the improvement strategy.

5.3. Reducer Ontology Modeling and Inference Results

Based on the proposed knowledge model construction method for assembly sequence planning, we realized the construction of the ontology model. It takes the assembly hierarchy and assembly structure as nodes, and consists of hierarchical relationships, assembly structure classes, and assembly part classes, as shown in Figure 12. The focus is to perform the task of instance segmentation in the assembly body model using the improved Mask R-CNN neural network model to resolve the instances contained in the reducer assembly model. The improved Mask R-CNN model in the field of assembly sequence planning is obtained by utilizing the training set and validation set composed of the assembly body model of the reducer and the related parts, and the first-stage cylindrical gear reducer in the test set is used as an example. The assembly body model is labeled by instance segmentation, and the obtained instance segmentation is shown in Figure 13. The recognized part categories are shaft, box, end cover, gear, etc., and the corresponding instance nodes (e.g., instance nodes Axle1, Axle2, Box_cover1, etc.) and the correlation relationship between the instance nodes are created.

In this study, after obtaining the instance composition of the assembly model, the extracted instances are modeled with attributes to form binary relationships between instances. The assembly process of a primary cylindrical gear reducer is constructed in the form of a displayed knowledge model, and the SWRL rules are used for knowledge reasoning to obtain an industrially feasible assembly sequence planning, as shown in Figure 14. The combination of instance segmentation and knowledge modeling in deep learning makes it easier for assembly process designers to develop new assembly processes for industrial products and can provide knowledge base and theoretical support for project implementation.

Subsequently, the reasoner Pellet (Incremental) is enabled in protege and the SWRL rules are selected for reasoning, and the results from the collated reasoning generate a series of feasible assembly sequences as shown in Figure 15.

6. Conclusions

This research aims to propose an intelligent reasoning method for assembly sequences by combining instance segmentation techniques and ontology modeling methods to reduce the cost of assembly planning and improve the efficiency of generating assembly sequences. The specific research results are as follows:

Firstly, based on the reducer assembly products, a dataset of 1910 assembly parts is established, which makes up for the lack of public datasets in instance segmentation, and lays a foundation for the subsequent training of high-precision instance segmentation network models.

Secondly, the network model was optimized based on the most widely used instance segmentation algorithm (Mask R-CNN). The improved Mask-U3 and Mask-U3-CBAM exceed the 81.8% of the original baseline model in the network performance evaluation index mAP50 with 83.1%and 84.0%, respectively, so as to segment the assembly parts in the image more accurately and obtain detailed assembly part information.

Furthermore, on the basis of collecting and sorting out the assembly semantic knowledge, a perfect assembly information ontology model is further constructed by using ontology, so that the ontology semantic model can comprehensively and accurately describe the parts, assembly relations and assembly constraints in the assembly. Jian forces six SWRL semantic rules to reason about assembly sequences. It effectively reduces the subjective dependence of artificial, improves the reusability of knowledge, reduces the cost of research and development, and makes a contribution to the research of sustainable development.

Finally, the reducer example is used to verify the feasibility of the assembly sequence planning method based on strength segmentation and ontology, and the efficient generation of product assembly sequence is realized. The results show that the proposed method can not only accurately segment the parts in the assembly, but also efficiently process and utilize the assembly information based on the ontology model, so as to generate efficient and feasible assembly sequences.

This research will continue to deepen the integration of technologies and innovative applications in the future, with a focus on the following directions: exploring advanced architectures such as Transformer and graph neural networks to enhance the model’s generalization ability for various types of assemblies and establish universal solutions; integrating point cloud data and 3D geometric information to build a multimodal fusion segmentation framework and solve the problem of precise recognition of complex surfaces and small parts; introducing reinforcement learning mechanisms to dynamically optimize assembly sequences, establish a digital twin system for virtual verification, and improve the efficiency and quality of sequence generation; developing an AR-assisted assembly system to achieve real-time closed-loop from segmentation to planning and execution, supporting collaborative operations of manual intervention and intelligent recommendation; and building lightweight models for different fields such as automobiles and precision instruments to adapt to edge computing devices, conducting production line deployment and performance evaluation.

Author Contributions

Conceptualization, X.S.; methodology, X.S. and X.W.; software, X.W. and X.X.; validation, X.S., X.W. and H.Z.; formal analysis, X.S. and X.W.; investigation, X.W.; resources, X.S. and X.W.; data curation, H.Z.; writing—original draft preparation, X.W; writing—review and editing, X.S. and X.W.; visualization, X.W.; supervision, X.S.; project administration, X.S.; funding acquisition, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Doctoral Start-up Fund of Liaoning University of Technology (Grant No. XB2022015) and the Science Research Fund Project of the Education Department of Liaoning Province (Grant Nos. JYTMS20230840 and LJKMZ20220983).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank the editors and anonymous referees for their insightful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, T.; Lee, J.-W.; Bohez, E. New integrated model to estimate the manufacturing cost and production system performance at the conceptual design stage of helicopter blade assembly. Int. J. Prod. Res. 2012, 50, 7210–7228. [Google Scholar] [CrossRef]
Wang, Z.B.; Ng, L.X.; Ong, S.K.; Nee, A.Y.C. Assembly planning and evaluation in an augmented reality environment. Int. J. Prod. Res. 2013, 51, 7388–7404. [Google Scholar] [CrossRef]
Gao, X.; Wang, X.; Li, Y.; Yang, M.; Liu, Y.; Guo, W. Workflow dynamic change and instance migration approach based on polychromatic sets theory. Int. J. Comput. Integr. Manuf. 2016, 29, 386–405. [Google Scholar] [CrossRef]
Zhu, Z.; Wang, Y.; Jiang, G. Statistical image modeling for semantic segmentation. IEEE Trans. Consum. Electron. 2010, 56, 777–782. [Google Scholar] [CrossRef]
Ahmadi, M.H.; Ahmadi, M.A.; Bayat, R.; Ashouri, M.; Feidt, M. Thermo-economic optimization of Stirling heat pump by using non-dominated sorting genetic algorithm. Energy Convers. Manag. 2015, 91, 315–322. [Google Scholar] [CrossRef]
Zhang, L.; Lv, H.; Tan, D.; Xu, F.; Chen, J.; Bao, G.; Cai, S. An adaptive quantum genetic algorithm for task sequence planning of complex assembly systems. Electron. Lett. 2018, 54, 870–872. [Google Scholar] [CrossRef]
Lu, C.; Yang, Z. Integrated assembly sequence planning and assembly line balancing with ant colony optimization approach. Int. J. Adv. Manuf. Technol. 2016, 83, 243–256. [Google Scholar] [CrossRef]
Tsai, H.-C. Unified particle swarm delivers high efficiency to particle swarm optimization. Appl. Soft Comput. 2017, 55, 371–383. [Google Scholar] [CrossRef]
Hu, A.; Sun, J.; Xiang, L.; Xu, Y. Rotating machinery fault diagnosis based on impact feature extraction deep neural network. Meas. Sci. Technol. 2022, 33, 114004. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
Xu, J.; Cai, Y.; Wu, X.; Lei, X.; Huang, Q.; Leung, H.-F.; Li, Q. Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing 2020, 386, 42–53. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–Miccai 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: New York, NY, USA, 2015; pp. 234–241. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference On Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Zhao, Y.; Chen, S.; Liu, S.; Hu, Z.; Xia, J. Hierarchical equalization loss for long-tailed instance segmentation. IEEE Trans. Multimed. 2024, 26, 6943–6955. [Google Scholar] [CrossRef]
Zhao, J.; Wang, Y.; Zhou, Y.; Du, W.-l.; Yao, R.; El Saddik, A. GLFRNet: Global-Local Feature Refusion Network for Remote Sensing Image Instance Segmentation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–12. [Google Scholar] [CrossRef]
Sudarsan, R.; Fenves, S.J.; Sriram, R.D.; Wang, F. A product information modeling framework for product lifecycle management. Comput. Aided Des. 2005, 37, 1399–1411. [Google Scholar] [CrossRef]
Bao, Q.; Zhao, G.; Yu, Y.; Dai, S.; Wang, W. The ontology-based modeling and evolution of digital twin for assembly workshop. Int. J. Adv. Manuf. Technol. 2021, 117, 1–17. [Google Scholar] [CrossRef]
Das, S.K.; Swain, A.K. An Ontology-Based Framework for Decision Support in Assembly Variant Design. J. Comput. Inf. Sci. Eng. 2020, 21, 1–44. [Google Scholar] [CrossRef]
Sanya, I.O.; Shehab, E.M. An ontology framework for developing platform-independent knowledge-based engineering systems in the aerospace industry. Int. J. Prod. Res. 2014, 52, 6192–6215. [Google Scholar] [CrossRef]
Zhu, G.; Xiaomin, J.; Hun, G. An Ontology-Based Design Knowledge Model for the Construction Machinery. Key Eng. Mater. 2011, 458, 271–276. [Google Scholar]
Chen, S.; Yi, J.; Jiang, H.; Zhu, X. Ontology and CBR based automated decision-making method for the disassembly of mechanical products. Adv. Eng. Inform. 2016, 30, 564–584. [Google Scholar] [CrossRef]
Zhong, Y.; Qin, Y.; Huang, M.; Lu, W.; Gao, W.; Du, Y. Automatically generating assembly tolerance types with an ontology-based approach. Comput. Aided Des. 2013, 45, 1253–1275. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Cong, P.; Li, S.; Zhou, J.; Lv, K.; Feng, H. Research on instance segmentation algorithm of greenhouse sweet pepper detection based on improved mask RCNN. Agronomy 2023, 13, 196. [Google Scholar] [CrossRef]
Gruhier, E.; Demoly, F.; Dutartre, O.; Abboudi, S.; Gomes, S. A formal ontology-based spatiotemporal mereotopology for integrated product design and assembly sequence planning. Adv. Eng. Inform. 2015, 29, 495–512. [Google Scholar] [CrossRef]

Figure 1. Overall methodological framework.

Figure 2. Assemblies and parts 3D modeling dataset presentation.

Figure 3. Data enhancement and noise processing. (a) is the original unprocessed image. (b) is the image of the image enhancement edge performed by the Laplace operator. (c) is the image with pretzel noise added to (b). (d) is the image after denoising by applying median filtering on the basis of (c).

Figure 4. Mask R-CNN model structure improvement.

Figure 5. CBAM attention mechanism module embedding and image size change.

Figure 6. UNet3+ network architecture.

Figure 7. Hierarchical structure of the assembly.

Figure 8. Ontology information model attribute definitions.

Figure 9. Fine labeling of reduction gear part.

Figure 10. mAP50 and overall loss function change.

Figure 11. Predictive results of three network models and model evaluation metrics. (A) is the bar chart of part segmentation accuracy. (B) is a bar chart of part segmentation recall. (C) is a bar chart of the part segmentation F1 scores.

Figure 12. Instance nodes for assemblies.

Figure 13. Instance split reducer and parts exploded view figure.

Figure 14. Ontology instances and results of rule-based reasoning.

Figure 15. Feasible assembly sequence schemes for rule–regulation reasoning.

Table 1. Definition of data properties for assembly semantic model.

Data Property	Domains	Ranges
hasWeight	Part	float
hasPrecision	Part	float
hasSize	Part	float
hasPosition	Part	float
hasStart	Part	boolean
hasEnd	Part	boolean
numAssemblyRelationships	Part	int

Table 2. Definition of object properties for assembly semantic model.

Object Property	Domains	Ranges
hasPart	Part	component
hasPartOf	Component	Part
hasAssembly	Product	Component
hasAssemblyOf	Component	Product
isMountedOn	Part	Part
connetedBy	Part	Part
requiresAssemblyAfter	Part	part
isStartInstall	part	Component
isEndInstall	Part	Component
connetedTo	Part	part

Table 3. Data on assessment indicators for different network models.

Algorithm	Backbone	Maskhead	(Bbox)mAP50	(Seg)mAP50	mAR
Mask R-CNN	Resnet50	FCN	82.0%	81.8%	93.7%
Mask-U3	Resnet50	UNet3+	83.5%	83.1%	95.4%
Mask-U3-CBAM	Resnet50 +CBAM	UNet3+	83.9%	84.0%	95.8%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, X.; Wu, X.; Zhang, H.; Xu, X. Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products. Sustainability 2025, 17, 3958. https://doi.org/10.3390/su17093958

AMA Style

Shi X, Wu X, Zhang H, Xu X. Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products. Sustainability. 2025; 17(9):3958. https://doi.org/10.3390/su17093958

Chicago/Turabian Style

Shi, Xiaolin, Xu Wu, Han Zhang, and Xiaolong Xu. 2025. "Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products" Sustainability 17, no. 9: 3958. https://doi.org/10.3390/su17093958

APA Style

Shi, X., Wu, X., Zhang, H., & Xu, X. (2025). Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products. Sustainability, 17(9), 3958. https://doi.org/10.3390/su17093958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining Instance Segmentation and Ontology for Assembly Sequence Planning Towards Complex Products

Abstract

1. Introduction

1.1. Intelligent Assembly Sequence Planning

1.2. Deep Learning Techniques for Assembly

1.3. Ontology in the Field of Assembly

2. Research Methods

3. An Improved Mask R-CNN Algorithm for Assembly Sequence Planning

3.1. Data Processing in the Field of Assembly Sequence Planning

3.2. Mask R-CNN Improvement in the Field of Assembly Sequence Planning

4. Construction of Assembly Knowledge Ontology and Semantic Reasoning Rules

4.1. Assembly Information Ontology Construction

4.2. OWL and SWRL Based Assembly Information Model with Inference Rules

5. Case Study

5.1. Task Configuration and Flow of Target Detection and Segmentation

5.2. Performance Analysis of the Improved Model

5.3. Reducer Ontology Modeling and Inference Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI