Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning

.


Introduction
Requirement dependency extraction is a branch of the requirements engineering field in software project development, where the inconsistency or incompleteness of requirement dependencies and error detection often lead to project and engineering development failure and the degradation of released software quality [1][2][3].The automatic extraction of requirement dependencies has become the focus of research in change propagation, requirement optimization and other fields [4,5].While requirement dependency extraction is important to project success, researchers have also found it difficult to manually extract requirement dependencies.According to a survey of software industry professionals on requirement dependency extraction, 90% of participants confirmed that they use manual methods to extract dependencies, and over 80% of participants agree that extracting requirement dependencies is difficult [6].The study involved 182 participants manually analysing 657 different dependency relationships, which proved to be time-consuming and posed a high risk for project failure due to the participants' need for prior domain knowledge [7].The automatic extraction method of requirement dependencies can take into account the scale and complexity of software systems, while also having the potential to improve cost control [8].Compared to timely corrective activities conducted during the requirements phase, delays in correcting requirements may result in up to 200-times-higher costs [9].Therefore, it is particularly urgent to achieve the accurate and speedy automatic extraction of requirement dependency relationships.
Currently, machine learning has been widely applied in various stages of software engineering.By using machine learning, we can solve the problems of incomplete modelling and algorithm defects encountered in software development [10,11].Machine learning can also perform data analysis tasks in software engineering, such as small dataset engineering problems [12], software requirements and code review problems [13][14][15].Another important role of machine learning is to reduce manual workloads in software engineering tasks [16][17][18][19][20], such as defect prediction, code suggestions, automatic program repair, feature localization and malware detection.Machine learning has also been widely applied in cost prediction, software testing and software quality assessment in the software development process, such as in consistency research between developers and tasks [21], integration testing [22], software development cost prediction [23] and software quality assessment [24].Meanwhile, requirements engineering has also applied a large number of machine learning methods [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39], such as requirement acquisition, requirement formalization, requirement classification, the identification of software vulnerabilities from requirement specifications, requirement prioritization, requirement dependency extraction and requirement management.Previous studies have demonstrated that the automatic extraction of requirement dependency relationships is a feasible and effective task [32][33][34][35][36][37][38].However, dependency relation extraction based on traditional machine learning suffers from issues such as feature singularity and low adaptability, making it difficult to represent the informational value of requirements from multiple perspectives.Additionally, when selecting prediction models, standalone machine learning models are often utilized.The performance of these models is constrained by their inherent algorithms and parameter settings, limiting their ability to fully leverage the features of requirement pairs, and thus leading to performance bottlenecks.
Therefore, a method for extracting requirement dependencies is proposed in this paper, which is based on feature fusion and an improved stacking model (P-Stacking).The novelty and contribution of the paper are as follows.Firstly, in this paper, we innovatively introduce part-of-speech features for the task of extracting requirement dependencies.Using the particle swarm optimization algorithm to assign different weights to different parts of speech in requirement texts enhances the informational value of the feature vector.We further integrate part-of-speech features, TF-IDF features, and Word2Vec features, which makes the feature vector of the requirement text contain part-of-speech information, word frequency information, and contextual semantic information.Secondly, we improve the stacking ensemble learning model.The Low Correlation Algorithm and Grid Search Algorithm are proposed to select the base model combination for the stacking model, which improves the automation of determining the base model and the predictive ability of the stacking model.The summary of the above two innovations is as follows.
(1) Aiming at the singleness problem in the feature extraction process, the part-of-speech features, TF-IDF features, and Word2Vec features of the requirement texts will be extracted and gradually integrated.During the extraction of part-of-speech features, the main structure of the requirement texts will be extracted through dependency parsing.The core components of the requirement texts will be assigned corresponding part-of-speech tags.The particle swarm optimization algorithm is employed to assign weights to each part-of-speech tag, to emphasize the informational content of important parts of speech.During the fusion process for part-of-speech features and TF-IDF features, the weights of each part of speech are integrated into the TF-IDF values of the corresponding words.This enables the TF-IDF feature vector to not only contain word frequency information but also incorporate part-of-speech characteristics.To enrich the contextual information of the requirement texts, Word2Vec features [40,41] are subsequently integrated.
(2) Aiming at the limitations of standalone machine learning models in terms of prediction performance, this paper introduces an improved stacking ensemble machine learning model.Compared to other ensemble machine learning models, the stacking model exhibits a superior generalization ability and higher flexibility, resulting in its outstanding prediction performance in classification tasks [42][43][44].In this paper, based on the standard stacking ensemble machine learning model, the Low Correlation Algorithm and Grid Search Algorithm are proposed.The stacking model's base models are constructed based on multiple classifiers which have high complexity.Therefore, this paper proposes an algorithm with low correlation, which utilizes the Pearson correlation coefficient as a measurement criterion to eliminate some similar machine learning models.The remaining models with greater dissimilarity are selected as candidates for constructing the base model combination.Standard stacking models often rely on manual judgments when selecting base models, which causes a degree of subjectivity.Therefore, after excluding some machine learning models based on the Low Correlation Algorithm, the Grid Search Algorithm is used to automatically select the optimal combination of base models for the stacking model.The advantage of the method proposed in this paper is its ability to automatically allocate the optimal combination of base models, thereby eliminating the work for the manual analysis and determination of machine learning models when switching datasets.
When using machine learning to extract requirement dependency relationships, scholars choose different methods to extract the features of requirement texts.The literature [7,32,37,38,45] utilizes TF-IDF features to represent the feature information of requirement dependency pairs.Some studies [7,45] represent the features of requirement texts by using probabilistic features that can reflect the statistical correlation between words.In addition, POS-tag features are also used by some experts in requirement dependency extraction tasks [6,32,38].When using machine learning to extract requirement dependency relationships, some scholars also extract n-gram features from the requirement texts to construct feature vectors [6,38].In this article, part-of-speech features, TF-IDF features, and Word2Vec features are chosen as representations of the informational content of requirement texts.This decision is based on the following reasons.Firstly, previous studies have shown that adding part-of-speech features to word vectors as model inputs can effectively enhance the model's predictive capabilities [46][47][48].In the task of extracting dependency relations from requirement texts, verbs often reflect the action information of the requirements and determine the order in which two requirements occur.Subject nouns and object nouns can reflect the subjects of actions.Therefore, assigning higher weights to these three part-of-speech types can strengthen the informational expressiveness of important words in requirement sentences.Secondly, TF-IDF features can determine the importance of words based on their frequency in the current document and their frequency across the entire document collection.If a word appears frequently in the current document, its importance is higher.Conversely, if a word appears frequently across the entire document collection, its importance is lower.The TF-IDF feature-based extraction of dependency relations from requirement texts has been widely used.Thirdly, while part-of-speech features and TF-IDF features can capture certain aspects of word importance, they cannot connect the current word with its preceding and following words.Word2Vec features can capture the semantic information between words.Based on these reasons, the three types of features are selected in this article to represent the informational content of requirement texts.
The organizational structure of this paper is as follows.Section 1 provides a brief introduction to the importance of the automatic extraction of requirement dependency relationships and the research content of this paper.Section 2 introduces the relevant research on requirement dependency relationship extraction.Section 3 introduces the specific steps of extracting various features and feature fusion.Section 4 introduces the specific steps of improving the stacking model.Section 5 demonstrates the feasibility of the proposed method through a comparative analysis of experimental results.Section 6 provides a conclusion.

Related Work
In requirements engineering, machine learning methods are widely applied in various aspects of requirement research.Meanwhile, for the work of the automated extraction of requirement dependencies, there have been researchers analysing from the domains of ontology, active learning, deep learning and machine learning, and more research results have been obtained.
In the method of requirement acquisition, the following will introduce two aspects of requirement acquisition techniques, namely machine learning and natural language processing [25].The technology of machine learning-based requirement elicitation methods is divided into five parts, namely data cleaning and pre-processing, text feature extraction, learning, evaluation and tools.In the formal methods of requirement, the requirement formalization methods based on natural language processing and machine learning are investigated and classified [26], and researchers found that heuristic NLP methods are the most used technology for automated requirement formalization.In the requirement classification method, Rahimi et al. [27] proposed a new ensemble machine learning method to classify functional requirements.This ensemble learning method combines different machine learning models and uses a weighted set voting method for optimization.There are also articles [28] summarizing several machine learning methods and evaluating which ones are more effective in requirement classification.In the method of identifying software vulnerabilities from requirement specifications, in the requirement prioritization method, Talele et al. [29] extracted the TF-IDF and BOW features of a requirement odour text and used classification algorithms LR, NB, SVM, DT, and KNN to prioritize requirement odours.Vanamala et al. [30] mapped categories from the CWE repository to PROMISE_ In Exp, and machine learning methods were used to identify software vulnerabilities from requirement specifications.A new architecture [31] is proposed which utilizes software requirement specifications and user text comments to create a universal model.This model can be used to train the features of the model using a ML algorithm and prioritize requirement texts.In requirement management methods, Lucassen et al. [39] proposed a new and automated approach to visualize requirements by displaying concepts, text references and their relationships at different granularity levels.This method is based on two techniques, namely the clustering technique that groups elements into coherent sets and the state-ofthe-art semantic correlation technique.
In the ontology domain, a requirements dependency detection tool, OpenReq-DD, is introduced and summarized [6].The core of OpenReq-DD is the application of natural language processing (NLP) and machine learning (ML) techniques to automate the detection of requirement dependencies through the application of natural language processing (NLP) and machine learning (ML) techniques based on ontology that define the dependencies between specific terms related to the requirement domain.Deshpande et al. [32] proposed ensemble active learning (AL) variants with Ontology-Based Retrieval (OBR) to form two hybrid approaches for the extraction of three dependency types, where the role of OBR is to replace manual tagging and to extract the dependencies, respectively.Regarding requirement dependency relationship extraction, there is a method that combines semantic relations and syntax information by combining the semantic relations between the words in a requirement sentence and the context under domain-specific knowledge [33].
In the field of active learning, Deshpande et al. [32] proposed a method for extracting dependencies between requirements using an active learning (AL) variant and a further ensemble of this AL with an ontology-based retrieval (OBR) approach to form two hybrid methods.A method for the automatic extraction of requirement dependencies based on an ensemble active learning strategy is proposed [34], and this method uses the probability of uncertainty, text similarity, dissimilarity and active learning variant prediction divergence as a measure of the amount of sample value.
In the field of deep learning, Gräßler et al. [35] train BERT models using two types of training, pre-training and context-specific fine-tuning, to enable the automated requirement dependency analysis of complex technical systems.
In the field of machine learning, Samer et al. [7] proposed two content-based recommendation methods for identifying dependencies between requirements.The first one utilizes document classification techniques and uses four separate learners to identify the types of requirement dependencies defined at the text level.The second approach is based on latent semantics and uses real-world datasets to evaluate the defined baseline.A method is proposed to extract requirement dependencies using a two-phase formula [36].In the first phase, binary dependencies are identified using natural language processing (NLP) techniques and in the second phase, requirement dependency types are further analysed using three learners based on weakly supervised techniques.There are three main challenges in the area of requirement dependency acquisition [37].Firstly, studying natural language processing techniques to automatically extract dependencies from text documents and, further, using verb classifiers to automatically acquire and analyse different types of dependencies.Secondly, exploring the representation and maintenance of requirement dependency changes from designing graph theory algorithms.And finally, investigating the process of providing dependency recommendations.Atas et al. [38] proposed a method for recognizing the types of requirement dependencies through supervised classification techniques and trained and tested the proposed method using different learners.
The ontology construction and inference processes are mainly determined by the semantic dependencies between keywords, without considering the context of the requirement sentence.Rule-based ontology construction is a complex task which requires manually defining template rules to extract the ontology.The insufficiency of the rules and conflicts will affect the inference process.Although dependency extraction based on deep learning can achieve good prediction performance, training the model requires a large amount of sample data.For small sample datasets, deep learning models cannot accurately find feature information, which can lead to the overfitting of small samples and the underfitting of dependency extraction tasks.At present, when using machine learning for dependency extraction, it is mainly considered from the aspect of feature selection and classifier determination.However, the characterization of a single feature on the requirement sentence informativeness is not complete, and the standalone classifier will have a high prediction error rate due to having insufficient sample data.A method of feature fusion based on part-of-speech weight is proposed to address the problem of a single feature in the process of extracting information from requirement texts.A method for extracting requirement dependencies using an improved stacking ensemble learning model is proposed to address the problems of the low prediction accuracy of a standalone classifier.

Requirement Dependency Extraction Based on Feature Fusion and Standalone Machine Learning
This section will introduce an automatic extraction of requirement dependency based on part-of-speech features and a standalone machine learning model.The model diagram is shown as Figure 1.Firstly, the requirement terms are pre-processed by a series of operations like removing stop words, word segmentation, loading domain lexicons, partof-speech tagging, dependency syntax analysis, and feature engineering.Through a series of pre-processing processes, the feature vectors (V Rx , V Ry ) of requirement pairs (Rx, Ry) are generated.To assist machine learning algorithms in accurately performing the task of requirement dependency extraction, it is necessary to extract the most informative features from the requirement texts during the feature engineering stage.In this article, part-of-speech features, TF-IDF features, and Word2Vec features are utilized to represent the informational value of the requirement texts.By integrating these three types of features, they can be used as the inputs for standalone machine learning models.Secondly, the generated feature vectors from the pre-processing stage are input into each machine learning model for training and prediction, thereby achieving the automatic extraction of requirement dependency relationships.The standalone machine learning models selected in this article include K-Nearest Neighbors, decision trees, logistic regression, Random Forest, Support Vector Machine, Gaussian Naive Bayes, Multinomial Naive Bayes, Support Vector Regression, and Linear Regression.By comparing the prediction accuracy of each standalone machine learning model based on different features, the effectiveness and feasibility of the proposed part-of-speech features and feature fusion method in this article can be validated.
they can be used as the inputs for standalone machine learning models.Secondly, th generated feature vectors from the pre-processing stage are input into each machine learn ing model for training and prediction, thereby achieving the automatic extraction of re quirement dependency relationships.The standalone machine learning models selected in this article include K-Nearest Neighbors, decision trees, logistic regression, Random Forest, Support Vector Machine, Gaussian Naive Bayes, Multinomial Naive Bayes, Sup port Vector Regression, and Linear Regression.By comparing the prediction accuracy o each standalone machine learning model based on different features, the effectiveness and feasibility of the proposed part-of-speech features and feature fusion method in this articl can be validated.

Remove stop words and special symbols
Load the domain lexicon

Pre-processing
Word

Types of Requirement Dependencies
The meaning of the requirement dependency relationship is that one requirement Rx, acts on another requirement, Ry, and this relationship is not affected by other relation ships.For any set of requirement pairs (Rx, Ry), if no relationship exists between Rx and Ry, they are considered to be independent.If there are dependencies, six types of depend encies between requirements are defined based on the UML modelling language, which are the notification, arouse, call, conflict, aggregation, and similar tendencies.The specifi definitions of this six dependency relationships are as follows: (

Requirement Pre-Processing
Natural language processing (NLP) plays diverse roles in software development, in cluding those of improving development efficiency, enhancing the user experience and software functionality.The specific roles of natural language processing in software de velopment include: requirement analysis and specification, document automation and generation, serving as an intelligent code editor, a natural language interface, performing defect analysis and repair, collaborative development and team communication, auto mated testing, intelligent search and information retrieval, sentiment analysis and use feedback.Overall, the role of natural language processing in software development is to improve communication, understanding, and efficiency during the development proces

Types of Requirement Dependencies
The meaning of the requirement dependency relationship is that one requirement, Rx, acts on another requirement, Ry, and this relationship is not affected by other relationships.For any set of requirement pairs (Rx, Ry), if no relationship exists between Rx and Ry, they are considered to be independent.If there are dependencies, six types of dependencies between requirements are defined based on the UML modelling language, which are the notification, arouse, call, conflict, aggregation, and similar tendencies.The specific definitions of this six dependency relationships are as follows: (

Requirement Pre-Processing
Natural language processing (NLP) plays diverse roles in software development, including those of improving development efficiency, enhancing the user experience and software functionality.The specific roles of natural language processing in software development include: requirement analysis and specification, document automation and generation, serving as an intelligent code editor, a natural language interface, performing defect analysis and repair, collaborative development and team communication, automated testing, intelligent search and information retrieval, sentiment analysis and user feedback.Overall, the role of natural language processing in software development is to improve communication, understanding, and efficiency during the development process by understanding and processing natural language texts, thereby enhancing the quality of the software and the user experience.
In the pre-processing steps of natural language processing, there are many procedures where machine learning can be used, such as text cleaning, word segmentation, word embedding, part-of-speech tagging, named entity recognition, sentiment analysis, partof-speech restoration, stem extraction, and stop word removal.Although some of these processes can use traditional rule methods, accuracy and generalization performance can be improved well by using machine learning models.For example, in word segmentation and part-of-speech tagging, machine learning models can learn patterns from a large amount of text data to better adapt to different fields and contexts.
In this article, a large number of machine-learning-based tools are used for requirement text pre-processing, such as the Language Technology Platform (LTP), the NLTK (Natural Language Toolkit), StanfordNLP (Stanford's CoreNLP), and Word2Vec.The syntax analysis module in the LTP typically leverages machine learning methods such as Conditional Random Fields (CRFs) and neural networks.The NLTK is a Python library used for processing human language data, which utilizes machine learning techniques in some modules.For example, the 'PunktSentenceTokenizer' class in the 'nltk.tokenize'module utilizes the Punkt model.The Punkt model is an unsupervised sentence segmentation model that learns statistical rules of text to complete segmentation tasks.In the part-ofspeech tagging module, the default annotator used for the 'nltk.pos_tag'function is based on the maximum entropy classifier.StanfordNLP is a toolkit developed by the Natural Language Processing Group at Stanford University.The syntax analysis module uses deep learning methods to generate tree structures of sentences to represent the syntactic relationships between words.
The pre-processing flow of the requirement text is shown in Figure 2. If it is a Chinese requirement text, the JIEBA participle tool is used to participle the requirement sentence Rx.Domain lexicons are introduced to correct the errors in participle and part-of-speech tagging.The Language Technology Platform (LTP) is used to perform the dependency syntax analysis of the requirement sentence.In the case of an English requirement text, the NLTK tool is used for word segmentation, part-of-speech tagging, and word form reduction.StanfordNLP is used for dependency syntax analysis.The subject-predicateobject triplets are extracted in the dependency syntax analysis, which correspond to three parts of speech, which are the subject noun, predicate verb, and object noun, respectively.Corresponding weights are assigned to each part of speech, and they use its feature for weighing TF-IDF.The improved TF-IDF with Word2Vec are integrated to generate feature vectors for requirement pairs.Since the requirement text is often an elaboration of specific requirements in a domain, there will be wrong division when dividing the words and part-of-speech tagging of domain-specific vocabulary.For example, in the requirement sentence {The teaching assistants can assist students in complete projects in the system} in the Course Management System [49], the word "complete projects" is labelled as a gerund structure, whereas it should be a verb structure in this requirement domain.Therefore, in this paper, a lexicon module for the specific requirement domain will be added to the process of requirement analysing in the dependency syntax to correct the misclassified participle and part-of-speech-tagging results, and, thus, greatly improve the accuracy of requirement dependency extraction.The domain lexicon based on the requirement text of the course management system mainly includes the login password (n), completing projects (v), answering questions (v), the teaching assistant (n), and group members exchanging groups (n).

Dependency Syntax Analysis
When the requirement text is represented in Chinese, we select t guage technology open-source platform (https://cloud.itp.ac.cn, acce 2023) of the Harbin Institute of Technology as the tool for dependen The tool integrates the Chinese natural language analysis module to i grammar, semantics and the other five natural language processing Through the API web service provided by the platform, the tool can the performance of text analysis.A dependency grammar tree is a vi of dependency grammar analysis, which analyses the dependency each requirement sentence and selects the central verb of the sentence the syntax tree.Due to the existence of dependency type division b more suitable for keyword extraction to formalize the requirement.
As shown in Figure 3, the above process is illustrated by parsin ment sentence.For the requirement sentence R3 {The teaching assist dents in complete projects in the system}, the result of participle analy assistants, can, assist, students, in, complete projects, in, the, system part-of-speech tagging is {\def, \n, \c, \v, \n, \p, \v, \p, \def, \n}.Th ency syntax analysis is {6:SBV 6:ADV 6:ADV 5:ATT 3:POB 0:HED 8:A for the root node, for which the index number is 0, the index numbers from 1 in sequence.Each word (i.e., node) has a dependency type w

Dependency Syntax Analysis
When the requirement text is represented in Chinese, we select the LTP natural language technology open-source platform (https://cloud.itp.ac.cn, accessed on 27 October 2023) of the Harbin Institute of Technology as the tool for dependency syntax analysis.The tool integrates the Chinese natural language analysis module to include vocabulary, grammar, semantics and the other five natural language processing core technologies.Through the API web service provided by the platform, the tool can effectively improve the performance of text analysis.A dependency grammar tree is a visual representation of dependency grammar analysis, which analyses the dependency between words for each requirement sentence and selects the central verb of the sentence as the root node of the syntax tree.Due to the existence of dependency type division between nodes, it is more suitable for keyword extraction to formalize the requirement.
As shown in Figure 3, the above process is illustrated by parsing a simple requirement sentence.For the requirement sentence R3 {The teaching assistants can assist students in complete projects in the system}, the result of participle analysis is {The, teaching assistants, can, assist, students, in, complete projects, in, the, system}, and the result of part-of-speech tagging is {\def, \n, \c, \v, \n, \p, \v, \p, \def, \n}.The result of dependency syntax analysis is {6:SBV 6:ADV 6:ADV 5:ATT 3:POB 0:HED 8:ATT 6:POB}.Except for the root node, for which the index number is 0, the index numbers of each word start from 1 in sequence.Each word (i.e., node) has a dependency type with its parent node.For example, the dependency syntax analysis result corresponding to the "teaching assistants" node is "6: SBV", which means that the parent node of the "teaching assistants" node is the sixth node "assist" and the relationship between the two is SBV.Finally, through the dependency syntax analysis, the three backbone nodes (subject-predicate-object) of the requirement sentence are extracted to formalize the original requirement.When the requirement text is an English requirement text, StandfordNLP is a dependency syntax analysis tool.StandfordNLP is a deep-learning-based natu guage processing tool developed by Stanford University.When performing te cessing, StandfordNLP divides the text into basic units such as words, roots an phemes, and analyses the semantic and syntax relationships between them using network algorithms, so as to construct a dependency tree and extract the relations it.As shown in Figure 4, in English requirement sentence R4 {The system shall p static course information}, its participle result is {The, system, shall, provide, static, information}, and its part-of-speech tagging result is {\det, \noun, \verb, \verb, tive, \noun, \noun}, the analysis result of dependency syntax is {2:det 4:nsubj 4:au 7:amod 7:compound 4:dobj}, and by using dependency syntax, the main backbon requirement sentence can be extracted, which is the subject-predicate-object tripl tem, provide, information}.The dependency syntax analysis for the requirement s R5 {The system shall allow students to customize the notification behaviour} is sh Figure 5, and the participle results are {The, system, shall, allow, students, to, cus the, notification, behaviour}, the part-of-speech tagging result is {\det, \noun, \verb, \noun, \part, \verb, \det, \noun, \noun}, and the dependency syntax a result is {2:det 4:nsubj 4:aux 0:root 4:dobj 7:mark 4:xcomp 10:det 10:compound 7:o original extracted requirement sentence backbone is the subject-predicate-obje {system, allow, students}, but this requirement sentence's backbone cannot reflect formation contained in the requirement sentence.Therefore, in this paper, the ext method of the requirement sentence's backbone is improved.If the direct objec requirement sentence is a noun such as "students", "collectors", "administration" viduals" and so on, and affiliates an object complement, then the logical subject act subject noun, the verb in the object complement acts as the predicate verb, and th acts as the object noun.The improved method extracts the requirement sentence When the requirement text is an English requirement text, StandfordNLP is used as a dependency syntax analysis tool.StandfordNLP is a deep-learning-based natural language processing tool developed by Stanford University.When performing text processing, StandfordNLP divides the text into basic units such as words, roots and morphemes, and analyses the semantic and syntax relationships between them using neural network algorithms, so as to construct a dependency tree and extract the relationships in it.As shown in Figure 4, in English requirement sentence R4 {The system shall provide static course information}, its participle result is {The, system, shall, provide, static, course, information}, and its part-of-speech tagging result is {\det, \noun, \verb, \verb, \adjective, \noun, \noun}, the analysis result of dependency syntax is {2:det 4:nsubj 4:aux 0:root 7:amod 7:compound 4:dobj}, and by using dependency syntax, the main backbone of the requirement sentence can be extracted, which is the subject-predicate-object triplet {system, provide, information}.The dependency syntax analysis for the requirement sentence R5 {The system shall allow students to customize the notification behaviour} is shown in Figure 5, and the participle results are {The, system, shall, allow, students, to, customize, the, notification, behaviour}, the part-of-speech tagging result is {\det, \noun, \verb, \verb, \noun, \part, \verb, \det, \noun, \noun}, and the dependency syntax analysis result is {2:det 4:nsubj 4:aux 0:root 4:dobj 7:mark 4:xcomp 10:det 10:compound 7:obj}.The original extracted requirement sentence backbone is the subject-predicate-object triad {system, allow, students}, but this requirement sentence's backbone cannot reflect the information contained in the requirement sentence.Therefore, in this paper, the extraction method of the requirement sentence's backbone is improved.If the direct object of the requirement sentence is a noun such as "students", "collectors", "administration", "individuals" and so on, and affiliates an object complement, then the logical subject acts as the subject noun, the verb in the object complement acts as the predicate verb, and the object acts as the object noun.The improved method extracts the requirement sentence's backbone as {students, customize, notification behaviour}.

The TF-IDF Model
TF-IDF (Term Frequency-Inverse Document Frequency) evaluates the importance of a keyword in a text in terms of the frequency of its occurrence and the number of times it appears in the examined corpus to assess its importance in its document.If a word has a high number of occurrences in the text, its importance level rises, but if it has a high number of occurrences in the whole corpus, its importance level will decrease.TF denotes Term Frequency and IDF denotes Inverse Document Frequency.If a word occurs more times in the requirement document and less times in the examined corpus, it means that this word has a better distinguishing ability.The formula is shown as Formula (1), where tfij is the number of times word i appears in document j.N is the total number of documents.ni is the number of documents where word i appears.log ( log ) (1)

Improvement of TF-IDF
TF-IDF considers that if a word more frequently appears in a document, and, at the same time, it rarely appears in other documents, then it is probably a keyword.Although TF-IDF can reflect the importance of a word in the document to a certain extent, it does not consider the effect of different parts of speech on the classification of requirement dependencies.The degree of response from different parts of speech is diverse in a requirement text.Therefore, it is necessary to consider which part of speech has a greater impact on the type of requirement dependency to determine the weight of each part of speech in the requirement pair.
For example, in the Course Management System dataset [49].For the requirement pairs of "students' homework is corrected by the teaching assistant in the system" and "students need to receive good scores", it is necessary to simultaneously consider the relationship between the subject nouns "teaching assistant" and "students", the predicate verbs "correct" and "receive", and the object nouns "homework" and "score", because the semantics represented by each part of speech all have a decisive impact on the type of dependency.In the CMS dataset [50], there are more requirement sentences with the subject noun "system", so the relationship between the subject noun "system" can be ignored.In this paper, the particle swarm optimization (PSO) [51] algorithm is used to seek the optimal weight ratio for each part of speech.

The TF-IDF Model
TF-IDF (Term Frequency-Inverse Document Frequency) evaluates the importance of a keyword in a text in terms of the frequency of its occurrence and the number of times it appears in the examined corpus to assess its importance in its document.If a word has a high number of occurrences in the text, its importance level rises, but if it has a high number of occurrences in the whole corpus, its importance level will decrease.TF denotes Term Frequency and IDF denotes Inverse Document Frequency.If a word occurs more times in the requirement document and less times in the examined corpus, it means that this word has a better distinguishing ability.The formula is shown as Formula (1), where tfij is the number of times word i appears in document j.N is the total number of documents.ni is the number of documents where word i appears.log ( log ) (1)

Improvement of TF-IDF
TF-IDF considers that if a word more frequently appears in a document, and, at the same time, it rarely appears in other documents, then it is probably a keyword.Although TF-IDF can reflect the importance of a word in the document to a certain extent, it does not consider the effect of different parts of speech on the classification of requirement dependencies.The degree of response from different parts of speech is diverse in a requirement text.Therefore, it is necessary to consider which part of speech has a greater impact on the type of requirement dependency to determine the weight of each part of speech in the requirement pair.
For example, in the Course Management System dataset [49].For the requirement pairs of "students' homework is corrected by the teaching assistant in the system" and "students need to receive good scores", it is necessary to simultaneously consider the relationship between the subject nouns "teaching assistant" and "students", the predicate verbs "correct" and "receive", and the object nouns "homework" and "score", because the semantics represented by each part of speech all have a decisive impact on the type of dependency.In the CMS dataset [50], there are more requirement sentences with the subject noun "system", so the relationship between the subject noun "system" can be ignored.In this paper, the particle swarm optimization (PSO) [51] algorithm is used to seek the optimal weight ratio for each part of speech.

The TF-IDF Model
TF-IDF (Term Frequency-Inverse Document Frequency) evaluates the importance of a keyword in a text in terms of the frequency of its occurrence and the number of times it appears in the examined corpus to assess its importance in its document.If a word has a high number of occurrences in the text, its importance level rises, but if it has a high number of occurrences in the whole corpus, its importance level will decrease.TF denotes Term Frequency and IDF denotes Inverse Document Frequency.If a word occurs more times in the requirement document and less times in the examined corpus, it means that this word has a better distinguishing ability.The formula is shown as Formula (1), where tf ij is the number of times word i appears in document j.N is the total number of documents.n i is the number of documents where word i appears.

Improvement of TF-IDF
TF-IDF considers that if a word more frequently appears in a document, and, at the same time, it rarely appears in other documents, then it is probably a keyword.Although TF-IDF can reflect the importance of a word in the document to a certain extent, it does not consider the effect of different parts of speech on the classification of requirement dependencies.The degree of response from different parts of speech is diverse in a requirement text.Therefore, it is necessary to consider which part of speech has a greater impact on the type of requirement dependency to determine the weight of each part of speech in the requirement pair.
For example, in the Course Management System dataset [49].For the requirement pairs of "students' homework is corrected by the teaching assistant in the system" and "students need to receive good scores", it is necessary to simultaneously consider the relationship between the subject nouns "teaching assistant" and "students", the predicate verbs "correct" and "receive", and the object nouns "homework" and "score", because the semantics represented by each part of speech all have a decisive impact on the type of dependency.In the CMS dataset [50], there are more requirement sentences with the subject noun "system", so the relationship between the subject noun "system" can be ignored.In this paper, the particle swarm optimization (PSO) [51] algorithm is used to seek the optimal weight ratio for each part of speech.
The basic idea of the particle swarm optimization algorithm is to mimic the behaviour of bird flock foraging.Each particle represents a bird in space, and the position information of the particle is the solution to optimize the problem.The process of the algorithm is that the particles keep changing their speed and position, approaching the optimal solution, and finally finding the optimal solution.In particle swarm optimization (PSO), the search space refers to the set of all feasible solutions to the optimization problem.In this search space, each solution is regarded as the position of a particle.For example, in this paper, the search space for particles is a six-dimensional space, where each particle in this six-dimensional space will have position information.The set of all the possible position information that particles can have constitutes the search space.The position of a particle can be represented as x (x d1 , x d2 , x d3 , x d4 , x d5 , x d6 ), where x d1 indicates the position of the particle in the first dimension.The position x (x d1 , x d2 , x d3 , x d4 , x d5 , x d6 ) corresponds with six parts of speech (subject noun Rx , predicate verb Rx , object noun Rx , subject noun Ry , predicate verb Ry , object noun Ry ).The fitness function is a function used to evaluate the performance of each particle in the solution space.Each particle's position represents a solution, and the fitness function can be used to calculate the fitness value of the current solution.For example, in this paper, the fitness function is the F1 score, which is the evaluation metric used in the Random Forest model.The particle's position information is used to weight the TF-IDF feature vector.The Random Forest model employs this weighted vector for training and testing to obtain the F1 score.The F1 score obtained through this process serves as the fitness value for the current particle.The updating formula of particle's speed and position in the particle swarm optimization algorithm is shown as follows.
In Formulas ( 2) and ( 3), i is the number of particles, d is the dimension, t is the current iteration number, x id is the position of the ith particle, v id is the velocity of the ith particle, pbest id is the individual optimal solution of the ith particle, pgest id is the global optimal solution of the ith particle, c 1 is the individual learning factor, c 2 is the population learning factor, w is the inertia weight, and r 1 and r 2 are the random numbers in the interval [0, 1].In this paper, the particle swarm optimization algorithm is used to seek the optimal weight ratio for each part of speech, in which the optimal weight ratio is used to improve the TF-IDF value.The improved formula is as follows.
where x k is the weight of each part of speech in the requirement sentence.tf ij is the number of times word i appears in document j.N is the total number of documents.n i is the number of documents where word i appears.The diagram of the improved TF-IDF model is shown in Figure 6.
Step (1): A dependency syntax analysis of the requirement sentence Rx is performed to extract the subject-predicate-object triad and merge the requirement pairs (Rx, Ry) to form six parts of speech.In the particle swarm optimization algorithm, the position information of the particles is located in a space of dimension 6, which represents the six parts of speech of the requirement pairs.Step (2): An initial value of interval (0, 1) is assigned to each particle using a random function, the TF-IDF is weighted with this value, to train and test by the Random Forest classifier.F1 is used as the fitness value of the particle to search for individual and global optimal solutions.Step (3): The position and velocity of each particle are updated according to Formulas (2) and (3).The fitness value of each particle is calculated based on the F1 value of the Random Forest classifier.Step (4): If the fitness value of the current particle is greater than pbest, pbest is updated as the position of the current particle, and if the current particle's fitness value is greater than gbest, then gbest is updated as the position of the current particle.Step (5): When the maximum number of iterations is reached, the search program is terminated, the current global optimal solution gbest is the final solution.According to the position information of the current particles, the optimal weight of each part of speech in the requirement pair is obtained, and the TF-IDF value will also be improved with this weight.Step (6): If the maximum number of iterations is not reached, step (3) is returned to. (

Multi-Weighted TF-IDF
When improving the TF-IDF, the weights of each part of speech in the requirement sentence are used to weigh the TF-IDF value.The particle swarm optimization algorithm is used to update the weights of each part of speech.The unique weight of the part of speech will be determined by training and testing the original dataset.
Since the prediction results of each requirement dependency pair will affect the F1 value, some requirement dependency pairs that are far from the centre of the optimal solution will cause the part-of-speech weight to shift towards these requirement dependency pairs.Therefore, the above method cannot find the optimal part-of-speech weight for each requirement pair.So, to reduce the error problem caused by a single part-of-speech weight, the datasets are divided according to the type of requirement dependency, and the weight of each part of speech is determined separately in each dataset division.The specific steps are as follows.
In the first step, the dataset is divided according to the requirement dependency types, and the requirement pairs with the same dependency types are placed in the same

Multi-Weighted TF-IDF
When improving the TF-IDF, the weights of each part of speech in the requirement sentence are used to weigh the TF-IDF value.The particle swarm optimization algorithm is used to update the weights of each part of speech.The unique weight of the part of speech will be determined by training and testing the original dataset.
Since the prediction results of each requirement dependency pair will affect the F1 value, some requirement dependency pairs that are far from the centre of the optimal solution will cause the part-of-speech weight to shift towards these requirement dependency pairs.Therefore, the above method cannot find the optimal part-of-speech weight for each requirement pair.So, to reduce the error problem caused by a single part-of-speech weight, the datasets are divided according to the type of requirement dependency, and the weight of each part of speech is determined separately in each dataset division.The specific steps are as follows.
In the first step, the dataset is divided according to the requirement dependency types, and the requirement pairs with the same dependency types are placed in the same dataset.In the second step, the corresponding part-of-speech weights are separately determined in each dataset by the method in Section 3.2.3.In the third step, when testing the original dataset, for each requirement pair, each set of part-of-speech weights determined in the second step will be, respectively, weighted with the TF-IDF value.Finally, each set of feature vectors is inputted into the model for prediction, each set of predicted values is compared, and the dependency type with the highest predicted value is selected as the result.

Weighted Word2Vec
TF-IDF values can only characterize the semantic information of each requirement pair, but cannot extract the contextual semantic information.Word2Vec is a word embedding method based on machine learning.The core idea of Word2Vec is to learn the distributed representation of words by predicting the context or target vocabulary, thereby mapping each word to a continuous vector space.During the training process, Word2Vec employs optimization algorithms such as gradient descent to adjust word vectors to minimize the objective function.Through this approach, the model can acquire the distributed representation of each word, making it so that the words that are similar in semantics are also closer in the vector space.Therefore, in this paper, we use the Skip-Gram model from Word2Vec as a pre-training model.The Skip-Gram model is used to map words to vectors, which are represented in high-dimensional space.The model is designed to capture the semantic relationships between words.The Skip-Gram model uses a neural network to learn the vector representations of words, and then uses these vectors to compute the similarity between words.As shown in Figure 7, the Skip-Gram model can predict the context words by being given a target word, inputting the word w(t) into the model, and the model predicts the above words w(t − 2), w(t − 1), w(t + 1), and w(t + 2) related to w(t).In this paper, we generate a set of word vectors for each requirement Rx and requirement Ry based on this model, and the average of the set of word vectors is used as the sentence vector of requirements Rx and Ry.The semantic relation of context is obtained by the word vector, and the feature information of the requirement pair (Rx, Ry) is obtained by merging the average word vector, which enriches the feature information of the dependency pair.
compared, and the dependency type with the highest predicted value is selected result.

Weighted Word2Vec
TF-IDF values can only characterize the semantic information of each requir pair, but cannot extract the contextual semantic information.Word2Vec is a word e ding method based on machine learning.The core idea of Word2Vec is to learn t tributed representation of words by predicting the context or target vocabulary, t mapping each word to a continuous vector space.During the training process, Wo employs optimization algorithms such as gradient descent to adjust word vectors t imize the objective function.Through this approach, the model can acquire the distr representation of each word, making it so that the words that are similar in semant also closer in the vector space.Therefore, in this paper, we use the Skip-Gram mode Word2Vec as a pre-training model.The Skip-Gram model is used to map words to v which are represented in high-dimensional space.The model is designed to captu semantic relationships between words.The Skip-Gram model uses a neural netw learn the vector representations of words, and then uses these vectors to compute th ilarity between words.As shown in Figure 7, the Skip-Gram model can predict the c words by being given a target word, inputting the word w(t) into the model, and the predicts the above words w(t − 2), w(t − 1), w(t + 1), and w(t + 2) related to w(t).paper, we generate a set of word vectors for each requirement Rx and requirem based on this model, and the average of the set of word vectors is used as the se vector of requirements Rx and Ry.The semantic relation of context is obtained by th vector, and the feature information of the requirement pair (Rx, Ry) is obtained by ing the average word vector, which enriches the feature information of the depen pair.This section illustrates the integration process of improved TF-IDF featur Word2Vec features through an example.The example of the requirement pairs teaching assistants can assist students in complete projects in the system, the teach sistants can help students answer questions in the system}.The specific steps are lows.In Step 1, in Section 3.2.2,we have calculated the TF-IDF value for each word requirement text.The TF-IDF feature vector for the above requirement pairs is {0.9 1.08, 0.66, 0.84, 0.42, 0.64, 0.28, 1.04, 0.68, 0.94, 0.46}.In Step 2, in Section 3.2.3,we calc the weights of six parts of speech words.The TF-IDF values of six words were we by their corresponding part-of-speech weights.The improved TF-IDF feature vec the above requirement pairs is {1.32, 0.52, 1.48, 0.66, 1.24, 0.42, 0.85, 0.28, 1.36, 0.6 This section illustrates the integration process of improved TF-IDF features and Word2Vec features through an example.The example of the requirement pairs is {the teaching assistants can assist students in complete projects in the system, the teaching assistants can help students answer questions in the system}.The specific steps are as follows.In Step 1, in Section 3.2.2,we have calculated the TF-IDF value for each word in the requirement text.The TF-IDF feature vector for the above requirement pairs is {0.96, 0.52, 1.08, 0.66, 0.84, 0.42, 0.64, 0.28, 1.04, 0.68, 0.94, 0.46}.In Step 2, in Section 3.2.3,we calculated the weights of six parts of speech words.The TF-IDF values of six words were weighted by their corresponding part-of-speech weights.The improved TF-IDF feature vector for the above requirement pairs is {1.32, 0.52, 1.48, 0.66, 1.24, 0.42, 0.85, 0.28, 1.36, 0.68, 1.23, 0.46}.In Step 3, based on the method described in the first paragraph of Section 3.2.5, each word is mapped to a vector in the vector space.The Word2Vec feature vector for the above requirement pairs is {0.18, 0.31, 0.27, −0.23, 0.51, −0.05, 0.14, −0.93, 0.34, −0.24, 0.53, 0.00}.In Step 4, we multiply the corresponding position values of the TF-IDF feature vector with the Word2Vec feature vector.The final feature vector for the above requirement pairs is {0.24, 0.16, 0.41, −0.15, 0.63, −0.02, 0.12, −0.26, 0.46, −0.18, 0.65, 0.00}.

Requirement Dependency Extraction Based on Improved Stacking Model
This section introduces an automatic extraction of requirement dependency based on an improved stacking ensemble machine learning model.The model diagram is shown as Figure 8.In Section 3, the requirement terms are pre-processed by a series of operations including removing stop words, word segmentation, loading domain lexicons, part-of-speech tagging, dependency syntax analysis, and feature engineering.The information features of the requirement are extracted, and feature vectors are generated.However, when using machine learning models to extract the requirement dependency, standalone machine learning models are used.To further improve the accuracy, this section proposes a requirement dependency extraction method based on ensemble machine learning models.This model is based on the stacking model and incorporates algorithms with low correlations and a grid search.The Low Correlation Algorithm can select classifiers with high contrast and accuracy from numerous machine learning models.The Grid Search Algorithm can automatically select base classifiers for the stacking model, and select the best combination of base classifiers from the candidate classifiers for better prediction performance.0.46}.In Step 3, based on the method described in the first paragraph of Section 3.2.5, each word is mapped to a vector in the vector space.The Word2Vec feature vector for the above requirement pairs is {0.18, 0.31, 0.27, −0.23, 0.51, −0.05, 0.14, −0.93, 0.34, −0.24, 0.53, 0.00}.In Step 4, we multiply the corresponding position values of the TF-IDF feature vector with the Word2Vec feature vector.The final feature vector for the above requirement pairs is {0.24, 0.16, 0.41, −0.15, 0.63, −0.02, 0.12, −0.26, 0.46, −0.18, 0.65, 0.00}.

Requirement Dependency Extraction Based on Improved Stacking Model
This section introduces an automatic extraction of requirement dependency based on an improved stacking ensemble machine learning model.The model diagram is shown as Figure 8.In Section 3, the requirement terms are pre-processed by a series of operations including removing stop words, word segmentation, loading domain lexicons, part-ofspeech tagging, dependency syntax analysis, and feature engineering.The information features of the requirement are extracted, and feature vectors are generated.However, when using machine learning models to extract the requirement dependency, standalone machine learning models are used.To further improve the accuracy, this section proposes a requirement dependency extraction method based on ensemble machine learning models.This model is based on the stacking model and incorporates algorithms with low correlations and a grid search.The Low Correlation Algorithm can select classifiers with high contrast and accuracy from numerous machine learning models.The Grid Search Algorithm can automatically select base classifiers for the stacking model, and select the best combination of base classifiers from the candidate classifiers for better prediction performance.

Ensemble Machine Learning
Ensemble machine learning is a method of combining multiple independent machine learning models to achieve better prediction and generalization capabilities.Ensemble machine learning improves overall accuracy and stability by synthesizing the prediction results of multiple models.

Ensemble Machine Learning
Ensemble machine learning is a method of combining multiple independent machine learning models to achieve better prediction and generalization capabilities.Ensemble machine learning improves overall accuracy and stability by synthesizing the prediction results of multiple models.
Ensemble learning models have various applications in software engineering, mainly in areas such as software defect detection, software quality assessment, software project risk management, requirement analysis, software testing, software tool optimization, software measurement and measurement combination.These application areas can improve efficiency, quality, and maintainability in software engineering from different perspectives.The advantage of ensemble learning is its ability to integrate the advantages of multiple machine learning models, thereby providing more robust and generalizable solutions.For example, in software defect detection, ensemble learning models can integrate the outputs of multiple defect prediction models to improve the overall predictive performance.Basic models include decision tree, Support Vector Machine, Neural Network, etc.When selecting ensemble learning models, it is necessary to consider the application domains of different models.At the same time, it is necessary to choose the appropriate basic learners and ensure their diversity.
There are various forms of ensemble machine learning methods, among which the most common are voting-based methods such as major voting and weighted voting.These methods make democratic decisions or weight decisions based on the prediction results of multiple models, ultimately selecting the final prediction result.Another common integration method is based on Bagging and Boosting methods, which can obtain the final prediction by averaging and voting upon the predictions of multiple models.The Boosting method gradually improves the overall accuracy by iteratively training a series of weak classifiers.The weights of the next classifier are adjusted based on the errors of the previous classifier.
Ensemble machine learning can also combine multiple different algorithms, for example, Random Forest is an ensemble method that combines multiple decision tree models.Random Forest trains multiple decision trees by randomly selecting data samples and features.The prediction results of multiple decision trees are then either voted upon or averaged to obtain the final prediction result.The stacking model [52] is also an ensemble learning method that combines the prediction results of multiple different types of base models (also known as primary learners) as inputs, and then trains a higher-level meta-model (also known as a secondary learner) to make the final prediction.
The stacking model combines different types of base models, which it can fully utilize the advantages of.The stacking model can also weight or fuse the prediction results of different base models, thereby improving the overall prediction ability of the model.At the same time, secondary learners are introduced to further learn the features of the original data, and to improve the model's generalization ability.Therefore, in this article, the stacking ensemble machine learning model is selected to replace a standalone machine learning model for requirement dependency extraction.Due to the need to train multiple base models and construct a training set for the meta-model, the stacking model has higher complexity in both training and prediction.Additionally, the stacking model has a stronger dependency on data and models.It is also necessary to carefully select the base model and design the structure of the secondary learners.Therefore, this article has improved the standard stacking model to reduce subjectivity and uncertainty in building the base model, while also reducing the risk of overfitting.

Improving the Stacking Ensemble Model
The stacking model was first proposed by Wolpert [52] in 1992; its core idea is to model on a stacking of original data.As shown in Figure 9, the base classifier learns the original data and gets the prediction results which are stacked to build a new dataset, and then the new sample data are given to the meta-classifier for fitting to output the final prediction results.In this paper, we propose an algorithm that automatically assigns the app base classifiers when facing different datasets.The diagram of the improved m shown in Figure 10.In this paper, we propose an algorithm that automatically assigns the appropriate base classifiers when facing different datasets.The diagram of the improved model is shown in Figure 10.

Low Correlation Algorithm
In phase one, a suitable machine learning algorithm is selected to build a requirement dependency extraction model, and then a classifier that can realize the requirement dependency extraction task is added to the candidate classifier set.Due to the large number of classifier models that have been selected in this paper, there may be situations such as similar classification effects and unsatisfactory classification effects between classifiers.Before selecting a base classifier, it is possible to exclude some classifiers with similar classification effects by comparing the correlation between them.Therefore, the Low Correlation Algorithm has been proposed in this paper.According to the principle of the stacking model, if the prediction accuracy of a classifier in the base model is too low, the prediction performance of the meta classifier will be reduced when the prediction results of this classifier are used as a new dataset for the meta classifier.Therefore, according to the experimental results of a standalone classifier in Section 5, classifiers with F1 values below 60 will be removed from the candidate classifiers.Furthermore, from the constructed set of candidate classifiers, classifiers with higher correlations are excluded by the Low

Low Correlation Algorithm
In phase one, a suitable machine learning algorithm is selected to build a requirement dependency extraction model, and then a classifier that can realize the requirement dependency extraction task is added to the candidate classifier set.Due to the large number of classifier models that have been selected in this paper, there may be situations such as similar classification effects and unsatisfactory classification effects between classifiers.Before selecting a base classifier, it is possible to exclude some classifiers with similar classification effects by comparing the correlation between them.Therefore, the Low Correlation Algorithm has been proposed in this paper.According to the principle of the stacking model, if the prediction accuracy of a classifier in the base model is too low, the prediction performance of the meta classifier will be reduced when the prediction results of this classifier are used as a new dataset for the meta classifier.Therefore, according to the experimental results of a standalone classifier in Section 5, classifiers with F1 values below 60 will be removed from the candidate classifiers.Furthermore, from the constructed set of candidate classifiers, classifiers with higher correlations are excluded by the Low Correlation Algorithm, in which the Pearson correlation coefficient is used to measure the correlation between classifiers.The Pearson correlation coefficient is a commonly used statistical indicator to measure the linear correlation between two variables, which can be used to measure the strength and direction of the linear relationship between two variables.The range of the Pearson correlation coefficient's values is [−1, 1], and the closer the absolute value is to 0, the smaller the correlation.The formula is calculated as follows.
The pseudo-code of the Low Correlation Algorithm (Algorithm 1) is shown as follows.for t ← 2, 3, . .., t do 3:

Grid Search Algorithm
In the phase two, from the set of classifiers that have removed those with high correlations and F1 values below 60, the Grid Search Algorithm is used to select different classifiers in turn to build the base model.The Grid Search Algorithm is a method used for hyperparameter tuning, which searches for the best combination of parameters to optimize model performance by traversing a given parameter space.The basic idea of the Grid Search Algorithm is to exhaustively search for all possible combinations of the given hyperparameters and evaluate the performance of the model through cross-validation.In this article, the Grid Search Algorithm is used to search for the optimal combination of the given classifiers, to improve the predictive performance of the stacking model.Due to the high complexity of Grid Search Algorithms, especially when the parameter space is large, this can lead to longer search times.Therefore, before using the Grid Search Algorithm, phase one is to use a Low Correlation Algorithm to remove some candidate classifiers, thereby reducing the parameter space.At the same time, to prevent the risk of overfitting, a 5-fold cross-validation method is used for training.The training results will be stacked into a new dataset and sent to the meta-classifier for final training and testing.The F1 value of the prediction result and the base classifier combination of the current round are recorded.After exhausting all classifier combinations, the highest F1 value and the corresponding base classifier combination are selected.
The Grid Search Algorithm (Algorithm 2) in pseudo-code is shown as follows.

Five-Fold Cross-Validation
In cross-validation, the main purpose of the data being cut into different layers is to more reliably evaluate the performance of the model.It is common practice to divide the data into 3-folds, 5-folds, and 10-folds, etc.The choice of the divide layer number will affect the stability of the model's evaluation and calculation cost.The more layers, the more data will be used to train and test, which will result in a more stable performance evaluation of the model.However, this also means that more training and testing iterations are required, increasing the computational cost.Conversely, the fewer the layers, the lower the computational cost, but the stability of the performance evaluation may be worse.Therefore, in this paper, the data are divided into 5-folds based on comprehensive considerations of model performance and computational cost.
The reasons for choosing cross-validation in the stacking model in this paper are as follows.Firstly, the stacking model relies on the prediction results of multiple base models as inputs.By utilizing five-fold cross-validation, we can ensure that each base model is trained and tested on different subsets of data, thereby enabling a more comprehensive evaluation of its performance.Secondly, cross-validation can help avoid data bias issues.The stacking model needs to ensure that the predictions of the base model are made on previously unseen data in order to more accurately assess its generalization ability.
In this paper, the process of 5-fold cross-validation is as follows.The dataset is split into five folds, and each time four folds are taken for training, the other one fold is predicted.The predicted value is passed to the next layer of the model as a new dataset so that overfitting can be effectively avoided.The 5-fold cross-validation model diagram is shown in Figure 11, where the data are divided into 5-folds.Therefore, five sets of datasets are generated, four of which are used as the training set, and the other set is used as the validation set.The training set is given to the model for training.The output of the validation set is obtained by using the model to predict it.Because it is a 5-fold cross-validation, which will form five validation sets, the prediction results of each validation set will be stacked to get the prediction results of the complete dataset.The method in Figure 11 is only for a single model.For the other models, the same method is used to obtain the prediction results, based on which, a new dataset is constructed.
is only for a single model.For the other models, the same method is used to obtain the prediction results, based on which, a new dataset is constructed.

Requirement Dependency Extraction Based on Single Part-of-Speech Weights
For requirement dependency extraction based on the ensemble learning model in Figure 10, the Course Management System dataset is taken as an example.
Firstly, the requirement terms are combined into pairs to generate a requirement dependency pair, encoded by the TF-IDF features, and the TF-IDF features are improved based on Figure 6.Then, Word2Vec is fused to generate the final feature vector as input for the classification model.For the requirement pairs {the teaching assistants can assist students in complete projects in the system, the teaching assistants can help students answer questions in the system} in the Course Management System dataset, the feature vectors for different features are shown in Table 1.Then, the feature vectors of requirement pairs are transferred into the ensemble learning model and the final ensemble learning model is determined by using the F1 value as the evaluation criterion.Finally, the dependency relationship of the above requirement pairs is extracted as a similarity relationship.The ensemble learning model, TF-IDF feature vectors, and Word2Vec feature vectors used in this section are the same as those in Section 4.3.1.The requirement pair is {the teaching assistants can assist students in complete projects in the system, the teaching assistants can help students answer questions in the system}.However, the key difference lies in the improved TF-IDF features.Based on the method outlined in Section 3.2.4,the dataset is partitioned into distinct groups.Particles conduct searches within each group

Requirement Dependency Extraction Based on Single Part-of-Speech Weights
For requirement dependency extraction based on the ensemble learning model in Figure 10, the Course Management System dataset is taken as an example.
Firstly, the requirement terms are combined into pairs to generate a requirement dependency pair, encoded by the TF-IDF features, and the TF-IDF features are improved based on Figure 6.Then, Word2Vec is fused to generate the final feature vector as input for the classification model.For the requirement pairs {the teaching assistants can assist students in complete projects in the system, the teaching assistants can help students answer questions in the system} in the Course Management System dataset, the feature vectors for different features are shown in Table 1.Then, the feature vectors of requirement pairs are transferred into the ensemble learning model and the final ensemble learning model is determined by using the F1 value as the evaluation criterion.Finally, the dependency relationship of the above requirement pairs is extracted as a similarity relationship.The ensemble learning model, TF-IDF feature vectors, and Word2Vec feature vectors used in this section are the same as those in Section 4.3.1.The requirement pair is {the teaching assistants can assist students in complete projects in the system, the teaching assistants can help students answer questions in the system}.However, the key difference lies in the improved TF-IDF features.Based on the method outlined in Section 3.2.4,the dataset is partitioned into distinct groups.Particles conduct searches within each group independently, ultimately determining seven part-of-speech weights.For conflict relationships, the weights from Section 4.3.1 are adopted.Seven sets of part-of-speech weights are used to weight the TF-IDF, respectively.When predicting the dependency relationships between requirements, the seven weighted TF-IDF features are fed into the prediction model.By comparing the prediction values, the dependency type with the highest prediction value is chosen as the final result.The part-of-speech weights for each group are shown in Table 2.
The seven feature vectors for the requirement pair instance are shown in Tables 3-9.It is ultimately determined that the highest prediction probability is achieved when using part-of-speech weight W6.The highest prediction probability based on Mul-TFIDF is 80.14%, indicating a similarity relationship.The highest prediction probability based on Mul-TFIDF*Word2Vec is 85.56%, also indicating a similarity relationship.

Experiments
In this paper, we argue the feasibility of the proposed method in the following variations.Firstly, when the ensemble learning model is not used, the classification performance of a standalone classifier is under different features.Secondly, the classification performance of a standalone classifier and the P-Stacking model after adding part-of-speech features.Thirdly, the classification performance of a standalone classifier and the P-Stacking model after adding multiple part-of-speech weight features.Fourthly, after improving the method of extracting the backbone of requirement sentences, an experimental evaluation is conducted on the CMS system dataset.Finally, we analysed the running time of the model proposed in this paper.

Datasets
In this paper, three datasets are used to validate the feasibility of the proposed method in terms of different features and different models.The Course Management System dataset (dataset 1) [49] contains 17 requirement sentences and generates 272 dependency pairs.The Composition Evaluation dataset (dataset 2) [53] contains 19 requirement sentences and generates 342 dependency pairs.The CMS System dataset (dataset 3) [50] contains 27 requirement sentences and generates 702 dependency pairs.

Experimental Environment
The environment configuration of the experimental platform is shown in Table 10.Ten classifiers are selected in this paper, including the K-Nearest Neighbor (KNN), decision tree (DT), logistic regression (LGR), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), Multinomial Naive Bayes (MNB), Support Vector Regression (SVR), Linear Regression (LR) and the improved stacking model (P-Stacking).

PSO Algorithm Parameter Settings
When determining the parameters of PSO, we did not adjust them based on a validation set.Instead, we adopted the following approach for parameter determination.In reference [54], the author suggests that the inertia weight w should be selected within the range of [0.4,0.9].The individual learning factor c1 and the social learning factor c2 should be set to equal values and chosen within the range of [0,4].Reference [54] found that when c1 = 2 and c2 = 2, particles can achieve a faster convergence speed.As the problem addressed in this paper is a six-dimensional one with a relatively high dimension, there is a risk of falling into local optima.Therefore, a set of parameter combinations with a larger c1 than c2 are selected as a comparative experiment.We conducted experiments on three datasets based on different values of w, c1, and c2.The evaluation index is the F1 value of the Random Forest.The F1 value represents the fitness value.The experimental results are shown in Figures 12-14.The highest fitness values for each parameter combination are shown in Table 11.When w = 0.5, c1 = 2.0, and c2 = 2.0, three datasets have the highest fitness value.Thus, the final parameters of the PSO algorithm are shown in Table 12.

PSO Algorithm Parameter Settings
When determining the parameters of PSO, we did not adjust them based on a validation set.Instead, we adopted the following approach for parameter determination.In reference [54], the author suggests that the inertia weight w should be selected within the range of [0.4,0.9].The individual learning factor c1 and the social learning factor c2 should be set to equal values and chosen within the range of [0,4].Reference [54] found that when c1 = 2 and c2 = 2, particles can achieve a faster convergence speed.As the problem addressed in this paper is a six-dimensional one with a relatively high dimension, there is a risk of falling into local optima.Therefore, a set of parameter combinations with a larger c1 than c2 are selected as a comparative experiment.We conducted experiments on three datasets based on different values of w, c1, and c2.The evaluation index is the F1 value of the Random Forest.The F1 value represents the fitness value.The experimental results are shown in Figures 12-14.The highest fitness values for each parameter combination are shown in Table 11.When w = 0.5, c1 = 2.0, and c2 = 2.0, three datasets have the highest fitness value.Thus, the final parameters of the PSO algorithm are shown in Table 12.(TIW2V), the improved TF-IDF*Word2Vec (P-TIW2V), and the Multi-Weighted TF-IDF*Word2Vec (Mul-TIW2V).The symbol "*" indicates the fusion of two features.

PSO Algorithm Parameter Settings
When determining the parameters of PSO, we did not adjust them based on a validation set.Instead, we adopted the following approach for parameter determination.In reference [54], the author suggests that the inertia weight w should be selected within the range of [0.4,0.9].The individual learning factor c1 and the social learning factor c2 should be set to equal values and chosen within the range of [0,4].Reference [54] found that when c1 = 2 and c2 = 2, particles can achieve a faster convergence speed.As the problem addressed in this paper is a six-dimensional one with a relatively high dimension, there is a risk of falling into local optima.Therefore, a set of parameter combinations with a larger c1 than c2 are selected as a comparative experiment.We conducted experiments on three datasets based on different values of w, c1, and c2.The evaluation index is the F1 value of the Random Forest.The F1 value represents the fitness value.The experimental results are shown in Figures 12-14.The highest fitness values for each parameter combination are shown in Table 11.When w = 0.5, c1 = 2.0, and c2 = 2.0, three datasets have the highest fitness value.Thus, the final parameters of the PSO algorithm are shown in Table 12.

Experimental Process and Data Statistics
Firstly, the requirement text is pre-processed to form requirement pairs, and the particle swarm optimization algorithm is used to match the optimal weights for each part of speech in the requirement pairs, and then the matched weights are used to improve the TF-IDF features.Secondly, the feature vectors are passed into the ensemble learning model, and the Low Correlation Algorithm, shown as Algorithm 1, is used to remove two classifiers with F1 values below 60 or high correlations.Finally, the Grid Search Algorithm is used to determine the optimal ensemble learning model, which is trained and tested to extract requirement dependencies as well as being evaluated according to its performance on different datasets.
The evaluation criteria used in this article are the Precision, Recall, and the F1 mean.Precision refers to the proportion of samples predicted or classified as positive that truly belong to positive examples.The calculation formula is shown in Formula ( 6), where TP represents True Positive samples and FP represents False Negative samples.Recall refers to the proportion of correctly predicted or retrieved positive cases in the total number of actual positive cases.The calculation formula is shown in Formula (7), where TP represents True Positive samples and FN represents False Negative samples.F1 is the harmonic mean of the Precision and Recall.The calculation formula is shown in Formula (8).

Experimental Results
This section will summarize the F1, Precision, and Recall scores of the three datasets for different features and different classifier models.The set of classifiers ζ = {KNN, DT, LGR, RF, SVM, GNB, MNB, LR}.Due to the plethora of ways to combine feature variables and base classifiers, each base classifier combination identified when using the Grid Search Algorithm is represented in rounds.

The Experimental Evaluation of the Standalone Classifiers for Three Datasets at Single Part-of-Speech Weights
In the experiments on the three datasets, each of the classifiers is trained and tested using the same experimental environment configuration.Five features are used for comparative evaluation.In this paper, the ratio of the training set to the test set is divided into 7:3 and experiments are conducted on nine standalone classifiers.In the particle swarm optimization algorithm, the number of particles is 50, the number of iterations is 30, and RF is used for training and testing to determine the fitness value of each particle.The part-of-speech weight results determined by the PSO algorithm are shown in Table 13.The F1 scores are shown in Tables 14-16.The Precision values and the Recall values are shown in Tables 17-19.After introducing part-of-speech weights for the TF-IDF features, the highest F1 scores for the requirement dependency extraction performed on the three datasets reached 89.12, 78.65 and 74.89, respectively.The highest Recall values for the requirement dependency extraction performed on the three datasets reached 89.89, 77.56, and 77.36, respectively.The highest Precision values for the requirement dependency extraction performed on the three datasets reached 90.03, 79.68, and 73.36, respectively.After using the improved TF-IDF weighted Word2Vec vector, the highest F1 values reached 90.01, 82.69 and 79.06, respectively.The highest Recall values for the requirement dependency extraction performed on the three datasets reached 90.32, 80.14, and 78.52, respectively.The highest Precision values for the requirement dependency extraction performed on the three datasets reached 92.70, 85.36, and 81.03, respectively.27-29.The multi-weighted TF-IDF features are trai tested in nine standalone classifier models.Although in some classifiers, the F1 s Mul-TFIDF feature is slightly lower than that of a P-TFIDF feature, the overal mental effect of multi part-of-speech weights is better than that of single part-o weights.The highest F1 scores for the requirement dependency extraction perfo     27-29.The multi-weighted TF-IDF features are trai tested in nine standalone classifier models.Although in some classifiers, the F1 s Mul-TFIDF feature is slightly lower than that of a P-TFIDF feature, the overal mental effect of multi part-of-speech weights is better than that of single part-o weights.The highest F1 scores for the requirement dependency extraction perfo    27-29.The multi-weighted TF-IDF features are trained and tested in nine standalone classifier models.Although in some classifiers, the F1 score of a Mul-TFIDF feature is slightly lower than that of a P-TFIDF feature, the overall experimental effect of multi part-of-speech weights is better than that of single part-of-speech              In this paper, when extracting the requirement sentence's backbone, the subject-predicate-object triad of the requirement sentence is extracted as the sentence backbone using dependency syntax analysis, and this triad is divided into three parts of speech for weight allocation.The above experiments are based on this method.However, in the CMS system dataset, for some requirement sentences, the extracted subject-verb-object triad cannot reflect the information contained the requirement sentences.Therefore, in Section 3.2.1, a method to improve the backbone extraction of requirement sentences is given, which is only used in the CMS System dataset.The experimental results are shown in Table 31.After improving the method for extracting the backbone of requirement sentences, Mul-TIW2V features are used for training and testing in nine standalone classifiers and P-Stacking models.Compared with the original method for extracting the backbone of requirement sentences, the experimental effect is significantly improved.The highest F1 score, Recall value, and Precision value reached 86.48, 88.57, and 89.53, respectively.In this paper, the original requirement sentence backbone extraction method is named OE, and the improved requirement sentence backbone extraction method is named PE.In this paper, when extracting the requirement sentence's backbone, the subjectpredicate-object triad of the requirement sentence is extracted as the sentence backbone using dependency syntax analysis, and this triad is divided into three parts of speech for weight allocation.The above experiments are based on this method.However, in the CMS system dataset, for some requirement sentences, the extracted subject-verb-object triad cannot reflect the information contained in the requirement sentences.Therefore, in Section 3.2.1, a method to improve the backbone extraction of requirement sentences is given, which is only used in the CMS System dataset.The experimental results are shown in Table 31.After improving the method for extracting the backbone of requirement sentences, Mul-TIW2V features are used for training and testing in nine standalone classifiers and P-Stacking models.Compared with the original method for extracting the backbone of requirement sentences, the experimental effect is significantly improved.The highest F1 score, Recall value, and Precision value reached 86.48, 88.57, and 89.53, respectively.In this paper, the original requirement sentence backbone extraction method is named OE, and the improved requirement sentence backbone extraction method is named PE.

Time Complexity Analysis
Due to the large number of particles and numerous iterations, the particle swarm optimization algorithm consumes significant computing resources during the process of finding the optimal solution.Additionally, the P-Stacking model necessitates an ergodic search to identify the optimal combination of base models.Therefore, it is necessary to analyse the time complexity of the method proposed in this paper.
This paper analyses the time complexity through the program running time.Under the same computer environment configuration, the program running times of the three datasets are shown in Table 32.To control the number of variables and to more intuitively display the time complexity of the method proposed in this paper, the running time of the program only includes the time spent in the training and testing of the data of the machine learning model.Time 1 is the time taken to calculate the fitness value using the RF model when using the PSO algorithm to find the optimal part-of-speech weight.Time 2 is the time it takes for the nine standalone machine models in the Low Correlation Algorithm to generate the predicted probability values.Time 3 is the time taken by the seven machine learning models in the Grid Search Algorithm to generate a new dataset using the method of 5-fold cross-validation.Time 4 is the time taken to determine the combination of base models using the Grid Search Algorithm.The time complexity of the proposed method is compared using nine standalone machine learning models and three deep learning models.The program run time results of the nine standalone models are shown in Tables 33-35.These nine models use the same feature vectors as Time 2. Although the method proposed in this paper is much higher than the standalone machine learning model in the proportion of time used, the actual running time is not high.The high accuracy rate is worth sacrificing a portion of the program run time.The standalone machine learning model uses the feature vectors extracted by the method in this paper, which makes the comparison unfair.Therefore, this paper uses a deep learning model to extract the requirement dependency without specifying the characteristics.That is, deep learning models need to learn features from data on their own.The experimental results are shown in Table 36.In dataset 1, the F1 score can be 30.42%higher when the program run time is 1.5459 times higher.In dataset 2, the F1 score can be 34.49%higher when the program run time is 1.0536 times higher.In dataset 3, the F1 score can be 37.69% higher when the program run time is 1.2664 times higher.

Discussion
There are two main objectives of this paper.First, the feature representation of a requirement text is enhanced by feature fusion.Second, the improved stacking ensemble learning model is used to improve the forecasting performance.Focusing on these two core objectives, this chapter conducted a detailed experimental comparison and analysis based on different datasets, various features, and different machine learning models.
The experimental results indicate that the part-of-speech feature extraction method and feature fusion method proposed in this paper can significantly improve the prediction performance of a standalone machine learning model.Compared with the method based on single features, the F1 score of each standalone machine learning model is significantly increased after the gradual fusion of part-of-speech features, TF-IDF features and Word2Verc features.In addition, after optimizing the part-of-speech feature extraction method, the F1 score of the machine learning model based on multiple part-of-speech weight features is also improved to a certain extent.
The experimental results show that compared with the two standalone machine learning models that exhibit the highest prediction accuracy, the improved stacking ensemble machine learning model can achieve the highest F1 score under each type of feature.This proves that the improved superposition model proposed in this paper is advanced.In addition, the improved stacking model does not need to manually adjust the basic model combination, and has a strong adaptability to experiments with different datasets.
The results of the time complexity analysis show that although the use of the particle swarm optimization algorithm to determine the part-of-speech weights will increase the program running time, it can improve the predictive performance of machine learning.When the time spent determining features is not considered, the time spent improving the stacking model is much less than that of the deep learning model.When considering all the time spent, the proposed method can also greatly improve the accuracy of requirement dependency extraction while sacrificing part of the time.

Conclusions
This paper proposes a requirement dependency extraction method based on feature fusion and an improved stacking ensemble machine learning model.This method can extract information and features from requirement texts from multiple dimensions.The improved stacking model can make full use of these features to better perform the task of requirement dependency extraction.Compared with the existing machine-learning-based methods, the proposed method can extract more information content, and give full play to the advantages of multiple standalone machine learning models, which makes requirement dependency extraction tasks have better results.
The main content of this study is divided into two parts.The first part is about extracting more information content from requirement texts.The second part is about finding a better machine learning model for requirement dependency extraction tasks.The extracted information features from the first part are the input into the second part of the machine learning model.The specific implementation steps of the proposed method are as follows.Firstly, the requirement texts are pre-processed by word segmentation and other steps.In the feature engineering stage, the part-of-speech features, TF-IDF features, and Word2Vec features are extracted from the requirement texts, and these three features are gradually fused into a feature.Secondly, all kinds of features are input into each standalone machine learning model, respectively.The results of each standalone model for the requirement dependency extraction tasks are evaluated based on different features.Thirdly, all kinds of features are input into the improved stacking ensemble learning model.The performance of the ensemble learning model is compared with that of the standalone model under the same feature.The experimental results show that the method of mixing three features and improving the stacking model can achieve the best performance in the experimental evaluation of three datasets.
The proposed method has the following advantages: (1) The proposed method of extracting part-of-speech features can assign high weights to important sentence components in the requirement text, and strengthen the information representation ability of important words.(2) The fusion of part-of-speech features, TF-IDF features and Word2Vec features can make the feature vector contain part-of-speech information, word frequency information and contextual semantic information at the same time, which enriches the content of the information contained in the feature vector.(3) The improved stacking ensemble learning model can reduce the possible deviations of a standalone model by combining the prediction results of multiple standalone models, so as to improve the overall prediction performance.(4) The improved stacking ensemble learning model can select the optimal combination of base learners to the greatest extent, and select the best collocation for each single model.When the dependency extraction task is changed, the optimal combination of base learners can be automatically assigned.
The shortcoming of this paper is that the use of supervised learning methods to train machine learning models requires a large number of labelled data.The manual labelling of the dependency types between requirement pairs is a task that requires a large amount of human resources.Therefore, the future work aims to consider combining neural networks and ontological reasoning to design an ontology tool that realizes dependency extraction and performs labelling automatically.We intend to construct an ontology with the subjectverb-object triples in requirement sentences as nodes and define inference rules based on temporal relations because there is a temporal relationship between the occurrence and termination of two requirements.For example, for notification relationships, only when one requirement is finished is another requirement allowed to begin.We intend to use neural networks to identify temporal relationships between node words.

Figure 1 .
Figure 1.A diagram of the multiple standalone machine learning requirement dependency extrac tion models used in this study.

Figure 1 .
Figure 1.A diagram of the multiple standalone machine learning requirement dependency extraction models used in this study.

Figure 8 .
Figure 8. Ensemble machine learning requirement dependency extraction model diagram.

Figure 8 .
Figure 8. Ensemble machine learning requirement dependency extraction model diagram.

5. 4 . 2 .
Experimental Evaluation of P-Stacking Models for Three Datasets at Single Part-of-Speech Weights In the P-Stacking model, a logistic regression model is selected for the meta-classifiers.When determining the combination of base classifiers, the Low Correlation Algorithm is used to remove two classifiers from the nine candidate classifiers.Then, the Grid Search Algorithm and 5-fold cross-validation are used to determine the combination of classifiers with the highest F1 score.Among the five features, for the Course Management System dataset, F1 scores reached their maximum at 30, 63, 103, 79, and 63 rounds, respectively.For the Composition Evaluation dataset, F1 scores reached their maximum at 33, 54, 63, 56, and 79 rounds, respectively.For the CMS System dataset, F1 scores reached their maximum at 91, 93, 62, 107, and 32 rounds, respectively.To demonstrate the advancement of the P-Stacking model, the two classifiers with the largest predicted mean values among the standalone classifiers are selected for comparative analysis with the ensemble learning model.As shown in Figures 15-17, after the introduction of the part-of-speech features and the P-Stacking model, the F1 values on the three datasets reached 92.72, 87.72 and 82.32, respectively.The experimental results of P-Stacking are shown in Table 20, and the highest Recall values for the requirement dependency extraction performed on the three datasets reached 93.65, 87.46, and 81.09, respectively.The highest Precision values for the requirement dependency extraction performed on the three datasets reached 92.49, 89.75, and 83.64, respectively.
In the P-Stacking model, a logistic regression model is selected for the meta-cl When determining the combination of base classifiers, the Low Correlation Algo used to remove two classifiers from the nine candidate classifiers.Then, the Grid Algorithm and 5-fold cross-validation are used to determine the combination of cl with the highest F1 score.Among the five features, for the Course Management dataset, F1 scores reached their maximum at 30, 63, 103, 79, and 63 rounds, resp For the Composition Evaluation dataset, F1 scores reached their maximum at 33 56, and 79 rounds, respectively.For the CMS System dataset, F1 scores reached th imum at 91, 93, 62, 107, and 32 rounds, respectively.To demonstrate the advance the P-Stacking model, the two classifiers with the largest predicted mean values the standalone classifiers are selected for comparative analysis with the ensemble model.As shown in Figures 15-17, after the introduction of the part-of-speech and the P-Stacking model, the F1 values on the three datasets reached 92.72, 87 82.32, respectively.The experimental results of P-Stacking are shown in Table 20, highest Recall values for the requirement dependency extraction performed on t datasets reached 93.65, 87.46, and 81.09, respectively.The highest Precision value requirement dependency extraction performed on the three datasets reached 92.4 and 83.64, respectively.

Figure 15 .
Figure 15.The P-Stacking model vs. the standalone classifiers for dataset 1.

Figure 15 .
Figure 15.The P-Stacking model vs. the standalone classifiers for dataset 1.

Figure 16 .
Figure 16.The P-Stacking model vs. the standalone classifiers for dataset 2.

Figure 17 .
Figure 17.The P-Stacking model vs. the standalone classifiers for dataset 3.

2 . 4 ,
different weights are assigned to the features.The features of multi-weighted TF-IDFs (Mul-TFIDFs) are compared improved TF-IDF (P-TFIDF).The part-of-speech weights for each group are show bles 21-23.The F1 scores are shown in Tables 24-26.The Precision values and th values are shown in Tables

Figure 17 .
Figure 17.The P-Stacking model vs. the standalone classifiers for dataset 3.

2 . 4 ,
different weights are assigned to the features.The features of multi-weighted TF-IDFs (Mul-TFIDFs) are compared improved TF-IDF (P-TFIDF).The part-of-speech weights for each group are show bles 21-23.The F1 scores are shown in Tables 24-26.The Precision values and th values are shown in Tables

Figure 17 .
Figure 17.The P-Stacking model vs. the standalone classifiers for dataset 3.

5. 4 . 3 .
Experimental Evaluation of Standalone Classifiers for Three Datasets with Multiple Part-of-Speech Weights Based on the method in Section 3.2.4,different weights are assigned to the TF-IDF features.The features of multi-weighted TF-IDFs (Mul-TFIDFs) are compared with the improved TF-IDF (P-TFIDF).The part-of-speech weights for each group are shown in Tables 21-23.The F1 scores are shown in Tables 24-26.The Precision values and the Recall values are shown in Tables

5. 4 . 4 .
Experimental Evaluation of P-Stacking Models for Three Datasets at Multiple Part-of-Speech Weights To prove the feasibility of using multi-word weight features in P-Stacking models, the multi-weight TF-IDF (Mul-TFIDF) features are compared with the single-weight improved TF-IDF (P-TFIDF) features, and the two standalone classifiers with the highest predicted means in the classifiers are selected for comparison and analysis with the P-Stacking model, and the results of the experiments are shown in Figures 18-20.The F1 values on the three datasets reached 92.87, 88.12 and 84.74, respectively.The experimental results of P-Stacking are shown in Table 30, and the highest Recall values for the requirement dependency extraction performed on the three datasets reached 93.01, 89.03, and 80.59, respectively.The highest Precision values for the requirement dependency extraction performed on the three datasets reached 93.47, 89.93, and 89.53, respectively.
To prove the feasibility of using multi-word weight features in P-Stacking models the multi-weight TF-IDF (Mul-TFIDF) features are compared with the single-weight im proved TF-IDF (P-TFIDF) features, and the two standalone classifiers with the highes predicted means in the classifiers are selected for comparison and analysis with the P Stacking model, and the results of the experiments are shown in Figures18-20.The F1 values on the three datasets reached 92.87, 88.12 and 84.74, respectively.The experimenta results of P-Stacking are shown in Table 30, and the highest Recall values for the require ment dependency extraction performed on the three datasets reached 93.01, 89.03, and 80.59, respectively.The highest Precision values for the requirement dependency extrac tion performed on the three datasets reached 93.47, 89.93, and 89.53, respectively.

Figure 18 .
Figure 18.The P-Stacking model vs. the standalone classifiers for the Course Management System dataset.

Figure 19 .
Figure 19.The P-Stacking model vs. the standalone classifiers for the Composition Evaluation da taset.

Figure 18 .
Figure 18.The P-Stacking model vs. the standalone classifiers for the Course Management System dataset.
To prove the feasibility of using multi-word weight features in P-Stacking models the multi-weight TF-IDF (Mul-TFIDF) features are compared with the single-weight im proved TF-IDF (P-TFIDF) features, and the two standalone classifiers with the highes predicted means in the classifiers are selected for comparison and analysis with the P Stacking model, and the results of the experiments are shown in Figures18-20.The F values on the three datasets reached 92.87, 88.12 and 84.74, respectively.The experimenta results of P-Stacking are shown in Table 30, and the highest Recall values for the require ment dependency extraction performed on the three datasets reached 93.01, 89.03, and 80.59, respectively.The highest Precision values for the requirement dependency extrac tion performed on the three datasets reached 93.47, 89.93, and 89.53, respectively.

Figure 18 .
Figure 18.The P-Stacking model vs. the standalone classifiers for the Course Management System dataset.

Figure 19 .
Figure 19.The P-Stacking model vs. the standalone classifiers for the Composition Evaluation da taset.

Figure 19 .
Figure 19.The P-Stacking model vs. the standalone classifiers for the Composition Evaluation dataset.

Figure 20 .
Figure 20.The P-Stacking model vs. the standalone classifiers for the CMS System dataset.

Figure 20 .
Figure 20.The P-Stacking model vs. the standalone classifiers for the CMS System dataset.
Ry cannot be implemented at the same time, then Rx and Ry hav a conflict relationship.(5) Aggregation.If Ry is a part of Rx, then Rx and Ry have an aggregation relationship.(6) Similar.If Rx and Ry have the same requirements, they have a similarity relationship 1) Notification.If Ry is implemented after Rx has been implemented, then there is a notification relationship between Rx and Ry.(2) Arouse.If Ry needs to be implemented after Rx, then there is an arouse relationship between Rx and Ry.(3) Call.If Rx needs to realize Ry first in the process of its realization, i.e., Rx is realized before Ry, but Ry completes the realization before Rx, then Rx and Ry have a calling relationship.(4) Conflict.If Rx and

If Rx and Ry cannot be implemented at the same time, then Rx and Ry have a conflict relationship. (5) Aggregation. If Ry is a part of Rx, then Rx and Ry have an aggregation relationship. (6) Similar. If Rx and Ry have the same requirements, they have a similarity relationship.
1) Notification.If Ry is implemented after Rx has been implemented, then there is a notification relationship between Rx and Ry.(2) Arouse.If Ry needs to be implemented after Rx, then there is an arouse relationship between Rx and Ry.(3) Call.If Rx needs to realize Ry first in the process of its realization, i.e., Rx is realized before Ry, but Ry completes the realization before Rx, then Rx and Ry have a

hematics 2024, 12
, x FOR PEER REVIEW Mathematics 2024,12,x FOR PEER REVIEW 12 of 37 global optimal solutions.Step (3): The position and velocity of each particle are updated according to Formulas (2) and (3).The fitness value of each particle is calculated based on the F1 value of the Random Forest classifier.Step (4): If the fitness value of the current particle is greater than pbest, pbest is updated as the position of the current particle, and if the current particle's fitness value is greater than gbest, then gbest is updated as the position of the current particle.Step (5): When the maximum number of iterations is reached, the search program is terminated, the current global optimal solution gbest is the final solution.According to the position information of the current particles, the optimal weight of each part of speech in the requirement pair is obtained, and the TF-IDF value will also be improved with this weight.Step (6): If the maximum number of iterations is not reached, step (3) is returned to.

Table 1 .
The feature vectors for different features.

Table 1 .
The feature vectors for different features.

Table 2 .
Part-of-speech weights in datasets with different dependency types.

Table 4 .
The feature vectors of different features when the part-of-speech weight is W1.

Table 5 .
The feature vectors of different features when the part-of-speech weight is W2.

Table 6 .
The feature vectors of different features when the part-of-speech weight is W3.

Table 7 .
The feature vectors of different features when the part-of-speech weight is W4.

Table 8 .
The feature vectors of different features when the part-of-speech weight is W5.

Table 9 .
The feature vectors of different features when the part-of-speech weight is W6.

Table 13 .
Part-of-speech weights for the three datasets.

Table 14 .
F1 scores (%) for each classifier on dataset 1 at different features.

Table 15 .
F1 scores (%) for each classifier on dataset 2 at different features.

Table 16 .
F1 scores (%) for each classifier on dataset 3 at different features.

Table 17 .
Precision and Recall values (%) for each classifier on dataset 1 at different features.

Table 18 .
Precision and Recall values (%) for each classifier on dataset 2 at different features.

Table 19 .
Precision and Recall values (%) for each classifier on dataset 3 at different features.

Table 20 .
Precision and Recall values (%) for P-Stacking on three datasets at different featur

Table 20 .
Precision and Recall values (%) for P-Stacking on three datasets at different featur

Table 20 .
Precision and Recall values (%) for P-Stacking on three datasets at different features.

Table 25 .
F1 scores (%) for each classifier on the Composition Evaluation dataset at different features.

Table 26 .
F1 scores (%) for each classifier on the CMS System dataset at different features.

Table 27 .
Precision and Recall values (%) for each classifier on dataset 1 at different features.

Table 28 .
Precision and Recall values (%) for each classifier on dataset 2 at different features.

Table 29 .
Precision and Recall values (%) for each classifier on dataset 3 at different features.

Table 30 .
Precision and Recall values (%) for P-Stacking on three datasets using different features.

Table 31 .
The F1, Recall, and Precision (%) values of different requirement sentence backbone extraction methods on each classifier.

Table 30 .
Precision and Recall values (%) for P-Stacking on three datasets using different features.

Table 31 .
The F1, Recall, and Precision (%) values of different requirement sentence backbone extraction methods on each classifier.

Table 32 .
Program run time (s) for three datasets.

Table 33 .
The nine models of program run time (s) in dataset 1.

Table 34 .
The nine models of program run time (s) in dataset 2.

Table 35 .
The nine models of program run time (s) in dataset 3.

Table 36 .
Three deep learning models of program run time (s) and F1 (%) in the three datasets.