Approach to Formalizing Software Projects for Solving Design Automation and Project Management Tasks

: GitHub and GitLab contain many project repositories. Each repository contains many design artifacts and specific project management features. Developers can automate the processes of design and project management with the approach proposed in this paper. We described the knowledge base model and diagnostic analytics method for the solving of design automation and project management tasks. This paper also presents examples of use cases for applying the proposed approach.


Introduction
The development of a modern software system is impossible without the construction and practical use of its architectural description (AD). The AD is in demand at all stages of a software system life cycle. The AD is the first (earliest) representation of a software system. The AD can be verified (tested) as a complete system. Moreover, the most significant requirements and restrictions are stated in the AD, ensuring that everyone considers and understands the concerns of stakeholders in the project.
Developers must comply with the requirements in the AD without fail in all following versions of the software system. Compliance ensures the integrity of the project. Developers can change the AD in the software system development, but only if there are very solid reasons.
Thus, the AD captures the high-level requirements and their corresponding decisions, which may not be changed at lower levels of the project, because changes to the AD are too costly.
The responsibility for the quality of the developed software system lies with the architects and the project managers. The specifics of the creation of modern software systems include the intensive use of software engineering in a highly complex computerized operating environment. The architects and developers consistently apply a heterogeneous experience when interacting with that environment. Development practice shows that these conditions contribute to the negative manifestations of the human factor, among which costly faults and design errors are especially undesirable.
According to the results of research by the Standish Group Corporation, regularly published since 1994, the success rate of projects has slightly more than doubled from 16% in 1994 to the present [1].
Developers are involved in three key processes when developing software [2,3]: 1.
Understanding the context of some problem area (domain).

2.
Designing a domain model and design space. 3.
The formation of some understanding of the context as design artifacts.
Designers need to highlight the entities and business processes from the domain in the first process. Those entities and business processes are important for solving development tasks and determine the significant properties of these objects. Designers form the operational space (OS) of design activity because of their understanding of the domain.
The designers form the conceptual space (CS) of design activity in the second process. Developers form the CS in the design process because of their mental activity based on their experience and understanding of the OS.
The developers form the materialization of the contents of the CS as a set of design artifacts because of the third process.
Currently, there is a large amount of research in software engineering. However, most of the activities of software developers in designing and constructing software are based on the experience gained from working on previous projects. Various functional and non-functional requirements for the software project often affect the development results. Thus, the formation of a coherent theoretical base to support the design and construction of any software is a complicated task [2,3]. At the moment, developers are using a basic theory of software engineering and various theories focused on the development of software in various classes.
Moreover, developers handle resource limitations when developing any software [4]. The project manager needs to evaluate the status of the project to make timely management decisions. The quality of management decisions directly affects the quality of the design artifacts and the quality of the SS.
In most cases, planning problems arise because of the following circumstances [4]: • The designers did not fully form the CS at the initial stages of the project; • The designers did not discuss the functional requirements with the customer; • There is not enough time to conduct usability testing.
Thus, we can define the following objectives of this study: • It is necessary to develop a model and methods for building a knowledge base to collect the experience of previous projects to support the processes of software design and construction; • It is necessary to develop a method of diagnostic analytics for the evaluation of the project development processes to improve the quality and efficiency of management decisions.
Thus, the main problems of the modern development of software systems are as follows [5][6][7][8]:

1.
A high level of uncertainty when a project is developed for a new domain or when using new architectural approaches or technologies.

2.
The influence of the external environment on the development process, including an unexpected reduction in resources.

3.
A lack of necessary competencies among team members. 4.
The need for the rapid assessment of numerous factors affecting the success of the project and the quality of project management decisions.
In this paper, we consider the experience of previous projects as a set of design artifacts. We understand a set of quantitative indicators from the task tracker as the key features of the project development processes, for example, the number of error notifications, the average time to close an error notification, the team size, etc.
We present this paper as the following sections. Section 2 contains the review of the works in this study for a better understanding of the problems and objectives. Section 3 presents a description of the proposed knowledge base model to consider the experience of previous projects and a description of the diagnostic analytics method to support the software development process. Section 4 presents examples of use cases for applying the proposed approach. Section 5 contains discussion. This paper ends with the conclusions.

State Of The Art
Different researchers studied software design automation in various works. In most cases, they propose models and methods for representing experience and knowledge to organize corporate knowledge bases by formalizing design artifacts of various types [3,[9][10][11][12][13][14].
The paper [3] considered the question-answer protocol for the case-based support of the design process. The author focused on the description of methods for solving various problems of designing and constructing software with a question-answering method (WIQA). The WIQA is a complex of methods and means that create and use QA-models for project tasks solved at the conceptual stage of software system designing. The primary applications of the WIQA are the iterative creation of QA-nets, the control distribution of the tasks of the tree in its current state among members of the team, and solving the tasks using stepwise refinement based on the question-answer analysis.
The authors of the paper [9] considered the software project as a set of different contexts that compose the description of the software architecture under the ISO/IEC/IEEE 42010:2022 standard [15]. The authors also presented in this paper the following metamodels: • A metamodel for the scope model kind; • A metamodel for the user model kind; • A metamodel for the environment model kind.
The authors of the paper [10] considered a software project as a fragment of an ontology specific to a domain of this project. That ontology contains a set of the key concepts related to a software system domain, a set of key concepts extracted from a source code to establish a semantic relationship between different concepts. In addition, the proposed ontology contains common knowledge from the General Software Engineering domain as a basis.
In papers [11,12], the authors described an algorithm that forms an ontology of a software project based on the analysis of a set of UML diagrams to identify various design patterns. The authors use design patterns that extract during the analysis for searching projects with structure similarity, considering their linkage to a specific domain.
In paper [13], the authors proposed a mathematical apparatus for representing the CS using fuzzy logic methods.
The proposed method allows for predicting the quality of a project in the future (health indicators prediction).
In papers [20,21], the authors empirically evaluated the impact of different community organization and project management styles on project quality.
The authors of the paper [20] investigated the relation between community patterns and smells, with the purpose of understanding whether the structural organization of a community might lead to some sort of social debt.
The authors of the paper [21] proposed YOSHI (Yielding Open-Source Health Information), a tool able to map open-source communities onto community patterns, sets of known organizational and social structure types, and characteristics with measurable core attributes.
In our opinion, it is necessary to form a knowledge base when solving the problem of using the experience of previous projects for design automation and project management. The knowledge base should consider various aspects of the software project and the features of the development process. Most importantly, all this information must be collected and analyzed, considering the dynamics of the development of the project during its life cycle. The following works influenced our study:  [19].
Thus, it is necessary to take a comprehensive approach to data collection. We need to consider the following: • Design artifacts; • The project's compliance with the requirements and constraints of some domain; • The influence of various indicators and management decisions to the project development process; • Their cumulative influence on each other.

Materials and Methods
This section discusses the proposed models, methods, and algorithms for automating the design and management of software projects.
Modern software development practices are mainly based on iterative (flexible) development methodologies that allow the following [3,4]: The ability to quickly respond to changing customer requirements; 2.
An operative demonstration of the new software functionality to customers for evaluation, clarifications, and adjustments; 3.
An increase in the efficiency of managerial decisions.
Moreover, developers use the Design Thinking methodology (DT) in the software development process [2][3][4]. The key feature of DT is the solution of engineering, business, and other problems, based on a creative, rather than analytical, approach. When using DT, developers do not solve problems based on critical analysis, but consider them as a creative process, which allows them to find unexpected and non-obvious solutions.

3.
The formation of ideas.
In this article, we consider each iteration of the flexible development process as the following steps ( Figure 1):

3.
Construction. As you can see from Figure 1, the quality of the planning and design stages affects the quality of the software design stage: 1.
The result of the planning stage depends on the quality of the analysis of functional and non-functional requirements [4], as well as on the quality of management decisions. We can represent management decisions as a set of tasks for developers and as a set of team management decisions. The project manager at the planning stage must consider the limitations of the resources, the limitations of the real world (domain), and the quality requirements.

2.
The result of the design stage depends on the planning stage and the qualifications of designers. Moreover, the design stage is a creative process in terms of the DT methodology, which requires the development of automated CS generation tools [2,3].
As you can see from the review of publications about the study, the use of methods of intellectual analysis and knowledge engineering makes it possible to automate the design stage based on the formalization of the experience of previous projects.

Knowledge Base Model for Formalizing of the Experience of Previous Projects
The following things influence the development process of a software system [15]: Thus, the development of the software system must be considered within the life cycle, the requirements, and the set of adopted design decisions. We consider the architecture of the software system as a set of representations of this architecture: a business representation, physical representation, and technical representation. The AD comprises design artifacts. The design artefact is the most primitive construction of an AD.
The AD is formed in the process of software system architecting. The AD can also be obtained by reworking the architecture description of previous projects [15].
The AD can be used within the life cycle of a software system in the following ways: • As a basis for the design and construction processes of a software system; • As a basis for the analysis and evaluation of alternative implementations of an AD; • As documentation in the development and maintenance processes of a software system; • To document significant aspects of a software system; • As input to automated tools for modeling, system simulation and analysis; • To define a group of software systems that have common properties (for example, architectural styles, reference architectures, and product line architectures); • For communication between the teams involved in software system development; • To provide communication between customers and developers; • To document the characteristics, properties, and features of a software system; • As a basis for planning the transition from a legacy architecture to a new one; • As a guide to operational and infrastructure support and the configuration management of a software system; • To support system planning and activities related to timelines and budgets; • As a basis for audits, analysis, and evaluation of a software system; • As a basis for the analysis and evaluation of alternative architectures; • For reusing the architectural knowledge through points of view, patterns, and styles; • To educate stakeholders on best practices for architecting and development.
The authors of the following papers [22,23] describe that ontologies can be used in architecting instead of traditional software system modeling languages (such as UML) since ontologies allow us to control the logical integrity and consistency of the resulting model. However, the existing methods of forming ontologies to support and automate the designing of software systems require the involvement of domain experts and specialists in knowledge engineering. The manual creation of ontologies requires significant time costs.
The main difficulty in creating knowledge bases to support the software systems development lies in the need to unify design artifacts. The formats and methods for storing design artifacts are different, which makes it difficult to analyze and use them in new software systems' development.
Considering that the specifics of the design knowledge in an AD lead to the need to form a knowledge base with a special structure, the knowledge base must include a set of representations for describing the following [15]: • The concepts of a domain; • The features of design artifacts formalized as knowledge base fragments; • The features of the development process as the main stages of a software system life cycle; • Sets of semantic relations between knowledge base entities; • Interpretation functions.
Ontologies are based on different description logics (DLs). DLs can guarantee the logical integrity and consistency of the ontology. DLs have decidability and a relatively low computational complexity. These features of DL provide a compromise between expressiveness and decidability. The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies.
The main component of an OWL 2 ontology is a set of axioms-statements that say what is true in the domain [24,25]. OWL 2 provides an extensive set of axioms, all of which extend the axiom class in the structural specification. Axioms in OWL 2 can be declarations, axioms about classes, axioms about object or data properties, datatype definitions, keys, assertions (sometimes also called facts), and axioms about annotations.
We use the following DL axioms to describe the terminology of the proposed knowledge base [24]: • is a special class with every individual as an instance (top); B is the intersection or conjunction of axioms (classes or roles); • ∀R.A is the universal restriction axiom; • ∃R.A is the existential restriction axiom; ∃R.Sel f ⊥ is the irreflexive roles axiom.
We represent the knowledge base model for formalizing the experience of previous projects using the following definition: where B i is an i-th indexed software project that we can define as follows: where L B is the representation of the development process. The representation of the development process allows for the consideration of the specifics of the project life cycle. In addition, this view can help a project manager to evaluate the impact of management decisions on the software project dynamics, for example, how did an increase in the number of developers affect development activity or project quality, etc. P B is the representation of the software project structure (directories and files). The representation of the software project structure allows for the obtainment of information about the structure of the files and directories of a software project to classify files into the following types: the source code, documentation, tools to build/compile the code, tests, an additional data directory, external dependencies (libraries), directory with binaries, etc. This information allows for the use of the necessary analysis methods for files with different types, as well as considering the structure of the project when extracting the design patterns.
T B is the representation of the software project environment (a set of a technology components). The representation of the software project environment allows for the extraction of information about the software system environment: dependencies (libraries), external components (services), runtime environments, etc. In addition, this representation allows for the consideration of various architectures styles and design patterns that developers used in a project. Information about the environment is very important because an incor-rectly configured environment can cause errors in the software system. The environment information also allows for the researching of only those completed projects that meet the requirements of the current project. D B is the representation of domain features. The representation of domain features allows for the definition of a problem area and the main use cases of a software project.
W B is the representation of the linguistic environment (concepts and terms). The linguistic environment allows for the equation of objects that have different names but have the same semantics, for example, employee and staff, development and construction, etc.
R B is the set of relations between knowledge base representations. We will discuss these relationships next.
Let us consider in more detail the components of the project representation in the knowledge base context (Equation (1)).
The common terminology of the knowledge base is: where hasName is a functional role common to all knowledge base classes. The hasName role allows us to specify the name of a class individual (object); Project is a class for describing a software project.
The terminology for the representation of the project development process L B can be described as the following axioms ( Figure 2): • A set of classes for describing the following: File-File (this is a part of the representation of the software project structure P B ).

•
The Project class has the following:

-
The hasMilestone, hasRequest, hasIssue, and hasBranch roles to specify ties between a project and a set of its milestones, merge/pull requests, issues, and branches; - The hasCommits and hasContributors transitive roles to define ties between a project and a set of its commits and contributors; - The hasDescription functional role to specify a tie between a project and its description: • The Milestone class has the following:

-
The hasRequest, hasIssue, and hasComment roles to specify ties between a milestone and a set of its merge/pull requests, issues, and comments; - The hasDescription functional role to specify a tie between a milestone and its description;

-
The f romProject inverse functional role to define a tie between a milestone and its project: Inv(hasMilestone) f romProject.

•
The Issue class has the following: - The hasContributor and hasComment roles to specify ties between an issue and a set of its contributors and comments; - The hasDescription functional role to specify a tie between an issue and its description; - The f romMilestone, f romRequest, f romBranch, and f romProject inverse functional roles to define ties between an issue and its milestone, merge/pull request, branch, and project: Issue ∃hasContributor.Contributor ∀hasContributor.Contributor ∀hasComment.String ∃hasDescription.String ∀hasDescription.String Inv(hasIssue) f romMilestone f romRequest f romBranch f romProject.

•
The Request class has the following:

-
The hasIssue and hasComment roles to specify ties between a merge/pull request and a set of its issues and comments; - The hasDescription functional role to specify a tie between a merge/pull request and its description; - The f romMilestone, f romBranch, and f romProject inverse functional roles to define ties between a merge/pull request and its milestone, branch, and project; - The hasCommits and hasContributors transitive roles to define ties between a merge/pull request and a set of its commits and contributors: Request ∀hasIssue.Issue ∀hasComment.String ∃hasDescription.String ∀hasDescription.String • The Branch class has the following:

-
The hasCommit, hasRequest, and hasIssue roles to specify ties between a branch and a set of its commits, merge/pull requests, and issues; - The f romProject inverse functional role to define a tie between a branch and its project: Inv(hasBranch) f romProject.

•
The Commit class has the following: - The hasContributor functional role to specify a tie between a commit and its contributor; - The hasComment and modi f yFile roles to specify ties between a commit and a set of its comments and modified files; - The f romBranch and f romProject inverse functional roles to define ties between a commit and its branch and project; - The hasMessage and hasDate functional roles to specify a tie between a commit and its description and date: Inv(hasCommit) f romBranch f romProject.

•
The Contributor class has the following:

-
The hasIssue, hasCommit, and hasRequests roles to specify ties between a contributor and a set of its issues, commits, and requests; - The f romProject inverse functional role to define a tie between a contributor and its project: Inv(hasContributor) f romProject.
As you can see in Figure 2, dashed lines are used to illustrate some entities and relationships. Such entities and relationships may not be contained in an indexed repository and therefore may not be represented in the knowledge base.
We describe the P B representation of a software project structure as the following axioms: • A set of classes for describing the following: The Commit class has the hasSourceDirectory and hasBuildFile functional roles to define ties between a commit and its source directory and build file: Commit ∃hasSourceDirectory.Directory ∀hasSourceDirectory.Directory ≤ 1hasSourceDirectory.Directory ∃hasBuildFile.File ∀hasBuildFile.File ≤ 1hasBuildFile.File.

•
The Directory class has the following:

-
The include irreflexive role to specify ties between a directory and a set of its subdirectories; - The includeFile role to define a ties between a directory and a set of its files: Let us see the representation of a software project environment T B as the following axioms: • The File class has the hasArch and hasDependency roles to specify ties between a file and a set of its architecture styles or design patterns and third-party dependencies: File ∀hasArch.Arch ∀hasDependency.Dependency.

•
The Dependency class has the hasGroup, hasName, and hasVersion functional roles to define the dependency properties (group, name, and version): We describe the representation of the domain features D B by the following axioms: • A set of classes for describing the following: -Domain entities-Entity; -Domain business processes-Process; -File-File (this is a part of the representation of the software project structure P B ).

•
The File class has the hasEntity and hasBusinessMethod roles to specify ties between a file and a set of its entities and business processes: File ∀hasEntity.Entity hasBusinessMethod.Process.

•
The Entity class has the hasProcess role to define a tie between an entity and a set of its business processes: Entity ∀hasProcess.Process.
We can represent the representation of the linguistic environment W B as the following axioms: • The Entity and Process classes have the hasConcept functional role to specify a tie between an entity or process and its concept: The Concept class has the hasTerm role to define ties between a concept and its terms: Concept ∀hasTerm.Term. It is necessary to develop a function to map the project of a software system to the model of the proposed knowledge base. That function can be represented as the following definition: where URL is a unified resource locator of a software project repository on the Internet; B i is the representation of a software project as a fragment of the proposed knowledge base. The Section 3.2 describes the function F B . At the moment, we implement the function F B algorithmically. Currently, the function F B supports projects in the Java language or the Spring framework only. In the future, we plan to form a metamodel to unify the behavior of the indexer that implements the function F B . The metamodel will make it possible to implement universal (in most cases) algorithms for formalizing projects for most structural and object-oriented programming languages.

Formalizing the Experience of Previous Projects
It is necessary to implement the mapping function for formalizing the experience of previous projects as fragments of the proposed knowledge base. The mapping function is based on an algorithm that comprises the following steps.
We consider the ng-tracker task tracker [26] as an example of data for indexing for the knowledge base population. This project is written in Java with the Spring Boot framework. For compactness, we consider only the 'ru.ulstu.conference' package and associated with that package issue #57 ('Creating classes for the Conference module') from the milestone #681923 ('Conferences').
Step 1. Extraction of the representation of the project development process L B . Representation of the project development process L B is formed from two sources [27]: • The project-hosting API (GitLab, GitHub, etc.); • The project Git repository.
The Git repository of a project is the preferred source, because it is more stable in terms of API changing and is always available to use. However, it is impossible to extract information about the stages of the development process (milestones, issues, merge/pull requests) only from a git repository.
We developed a software module to work with the GitLab REST API [28]. The following HTTP requests are used for interactions between the module and the GitLab API: 1.
After that, the JGit library [29] is used to extract the following information from each commit:
If there are no milestones and/or issues in the project, then the module extracts only information about commits using the JGit library. Figure 3 demonstrates the fragment of the L B representation extracted from the issue #57 of the milestone #681923 of the ng-tracker project.
All examples of representations of the knowledge base are illustrative. In fact, a primary key is generated for each entity, and all relationships between entities are formed based on these keys. Step 2. Extraction of the representation of a software project structure P B . The process of the structural analysis of a software project directory is used to create the representation of a software project structure P B [30]. The indexer module contains typical paths for finding the source code root path and various environment files for different programming languages and build systems. Figure 4 demonstrates the fragment of the P B representation extracted from the ng-tracker project.
As you can see from Figure 4, the following entities of the P B representation that are also presented in the L B representation are marked in gray: the entities with type 'File' and the entity '0428bad0' with type 'Commit'.
Step 3. Extraction of the representation of the software project environment T B .
To extract the representation of the software project environment T B , an expert must manually configure the Arch T and Dep T components for each programming language, framework, application type, and other features of the software project environment. We will consider this task and the prospects for its automation in more detail in one of the following papers.
For example, to determine the usage of the MVC pattern for Spring projects, it is necessary to find the @Controller annotation on a class and the @GetMapping, @PostMapping, @RequestMapping, etc. annotations for its methods (Listing 1). To determine structural design patterns, it is necessary to consider the project structure information from the P B representation.  The component for extracting the third-party dependencies Dep T works with the configuration files of build automation tools and extracts the name and version of the dependency libraries from them. For example, for Java projects the following files are scanned: build.gradle, pom.xml, etc. If a build.gradle file is found, then Gradle is specified as the project build automation tool. Then, the names and versions of the dependency libraries are extracted from the dependencies section of this file (Listing 2).  Figure 5a demonstrates the fragment of the T B representation with a set of architectural styles extracted from the ng-tracker project, and Figure 5b shows a set of the third-party dependencies.
In Figure 5, we mark entities that were used in other representations (Figures 3 and 4) in gray.
Step 4. Extraction of the representation of domain features D B . The representation of domain features D B is formed by searching in the source tree of the project for classes that describe the data models and business logic [27]. Each programming language and framework requires an indexer configuration to find the corresponding language/framework operators and constructions. For example, for Spring Boot projects, classes that describe data models are marked with the @Entity annotation (Listing 3), and business logic classes are marked with the @Service annotation (Listing 4).   Step 5. Formation of the the linguistic environment W B .
We form the linguistic environment by analyzing various text descriptions that are contained in the project repository and represented by terms of natural language using statistical [31] and linguistic [32] analysis methods. Figure 7 demonstrates the example of the fragment of the W B representation of the linguistic environment.
As you can see from Figure 7, one entity or business process of the D B representation can correspond to several concepts of the terminological environment. Thus, the resulting knowledge base is a source of design experience, based on which it is possible to form methods for automating the building of the CS to support the design stage.
We associate objects of various representations of the proposed knowledge base with commits. Commits contain information about creating, changing, or deleting objects.
The key position of the 'Commit' object allows us to consider the development process of the software project in a dynamic way. The presentation of information about the project in a dynamic way allows for the use of various methods of diagnostic and predictive analytics to improve the quality and efficiency of the decision making of project managers.

Diagnostic Analytics Method for Decision Support in Project Management
Using knowledge engineering methods in modeling and analyzing time series makes it possible to consider the limitations and features of some domain. Moreover, knowledge engineering methods allow us to choose the type of model and its parameters to improve the quality of time series analytics [33].
The data source of the proposed diagnostic analytics method is the knowledge base. The model of the knowledge base is presented in the previous section (Equation (1)). In our study, the key entity of the knowledge base is the 'Commit' object. The objects of the L B representation of the development process ( Figure 2) and the objects of the other representations ( Figure 8) have ties with the Commit object. As you can see from Figure 8, the key position of the 'Commit' object allows us to extract a set of time series of various indicators from the knowledge base based on a set of project commits with the required frequency and discreteness using the aggregation function.
We describe the function for extracting a set of time series from the knowledge base using the following definition: Then, we apply the following function to extract knowledge from time series: where F St is a function of the time series knowledge extraction. The algorithm of time series knowledge extraction contains the following steps: 1.
The evaluation of the indicator value using a set of expert 'if-then' rules. Each rule defines a range of values. The indicator is assigned some linguistic value when an indicator value belongs to a certain interval.

2.
The modification of state values based on the mutual influence of indicators on each other. For example, if the number of contributors increases, the state for the number of commits indicator should be changed to a lesser value.
. . , state i m } } is a set of states (trends) for indicators that are presented by the analyzed time series. Linguistic values represent the set of states, for example, few, medium, or many.
The project manager can form recommendations based on a set of states in the planning phase when starting a new iteration of the development process.

Results
This section presents the currently implemented functions for the design automation and project management of software systems. Work on the project is in progress, and we are constantly adding new functionality to the software platform.

Information Retrieval of Software Projects
The popular web services for hosting software projects use information retrieval based on text processing methods that do not consider the specifics of design artifacts [34].
We have developed an information retrieval subsystem that allows for the searching of software projects, considering the specifics of a software project from the knowledge base (Equation (1)). We use the Neo4j graph database management system for organizing a knowledge base. Neo4j has a high speed of query execution [35].
Let us represent the search query as the following definition: where L Q is a set of parameters for information retrieval by the following indicators of the development process from the representation of the project development process L B : • The number of contributors; • The number of commits. The number of business processes.
We represent the information retrieval function as: wherẽ B is a subset of projects that match query parameters; B is a set of indexed projects of the knowledge base. Currently, the user sets query parameters using a special form component that uses a separate input element for each parameter. We form a Cypher query based on the form component data. We use the 'UNION' operator to join all the atoms of the condition in the resulting Cypher query.
The following definition represents the function for calculating the relevance of project B i to query Q: where B i ∩ Q is the number of matching parameters in project B i and query Q; |Q| is the number of parameters in query Q.
An index graph is formed and saved in Neo4j for each project in the indexing process. Figure 9 demonstrates the graph for the ng-tracker project. As you can see from Figure 9, the graph contains the following nodes: • The 'Project' node. • The 'Entity' node. These nodes are formed based on the set of project entities from the D B representation ( Figure 6). • 'Process' node. This type of node is formed based on D B representation processes associated with a specific entity. • The 'Metric' node: 'Contributors', 'Commits', 'Entities', and 'Processes'. The values of 'Contributors' and 'Commits' metric nodes are formed based on the aggregation of data by the number of changes and contributors of the L B representation (Figure 3). The values for 'Entities' and 'Processes' metric nodes are formed based on the number of 'Entity' and 'Process' nodes.
All graph nodes have 'id' and 'name' properties. Metric nodes also have value properties with double types to store the value of the metric.
The following types of relations are used in the graph: • A 'hasEntity' relation for a 'Project' and an 'Entity' nodes connection; • A 'hasProcess' relation for an 'Entity' and 'Process' nodes connection; • A 'hasMetric' relation for a 'Project' and 'Metric' nodes connection.
Listing 5 demonstrates an example of the Cypher query to find projects with relevance calculation and sorting in a descending order of relevance. Such a Cypher query is generated automatically based on the user search parameters.
Listing 5. Example of the Cypher query of the information retrieval subsystem. The linguistic environment W B is used when generating a search query. When the user specifies in a search query the name of an entity or process, it is necessary to find a correspondence between each query term and the terms of the linguistic environment W B . If the term from the query matches the term of the W B representation, then the following needs to occur ( Figure 7):

1.
It needs to transit from the term to concept by the 'hasTerm' relation.

2.
Then, it needs to transit to the the entity or business process by the 'hasConcept' relation.
If we could obtain the entity or business process, then the corresponding term in the search query is replaced with the name of the entity or business process.
We plan to add support for additional search parameters for the information retrieval subsystem. We also plan to use fuzzy logic methods to represent quantitative data as linguistic values. For example: a small project, an average size of a development team, etc.
Thus, the proposed information retrieval subsystem allows for the automation of the research phase at the design stage by reducing the time costs and search space.

Generating Use Case Diagrams in UML Notation
The platform currently implements the function for generating use case diagrams in UML notation to automate the building of the CS.
We represent a use case diagram in UML notation as the following definition: where A UCD is a set of actors that perform certain roles in a given system; S UCD is a set of system boundaries that define the limits of the system; P UCD is a set of use cases that represent a business functionality; and R UCD = R UCD is a set of relations: • R UCD I is an include relationship, a use case that includes the functionality described in another use case as a part of its business process flow; • R UCD E is an extend relationship, where the child use case adds to the existing functionality and characteristics of the parent use case; • R UCD G is a generalization relationship, a parent-child relationship between use cases; • R UCD A is an association relationship, a relationship between actors and use cases.
At the moment, we generate diagrams based only on the hierarchy of the entities and business processes of the representation of domain features D B (Equation (1)): The proposed algorithm creates a use case diagram as a set of commands for the Plan-tUML system [36]. Now only actors, use cases, and association and include relationships are formed in the resulting use case diagram.
The entities and business processes of the D B representation (Figure 6) of the ng-tracker project are used to generate a use case diagram.
The use case diagram generation algorithm contains the following steps: 1.
Create an actor with the name 'User': A UCD 1 : : User :

2.
Create a root use case: P UCD 1 . Specify the name of the project as the name for a root use case:

5.
Obtain a list of business processes (P E i ) for each entity. Create a use case for each business process from P E i and connect it with an inclusion relation with a parent use case (entity E i ): Specify the name of a business process as the name of a use case: Figure 10 demonstrates an example of the use case diagram for the ng-tracker project generated with the PlantUML system.
We plan to add the following improvements to the subsystem for generating use case diagrams: • Use additional information from the knowledge base to improve the quality of the generated diagrams; • Add extend and generalization relations support; • Use natural language processing methods and linguistic environment W B to generate more correct (in terms of UML notation) names for use cases.
Thus, the subsystem for use case diagram generation can automate the formation of the CS at the planning stages for customer requirements definition or at the design stage to consider the experience of previous projects.

Diagnostic Analytics of Software Projects
For example, the tabbychat project [37] contains the experience of previous projects, and the ng-tracker project is currently being developed. We generated recommendations for the ng-tracker project based on data from the tabbychat project. We chose these repositories because the projects are written in Java and have comparable indicators.
We use the data of the following representations as initial data for the software project diagnostics: • The representation of the development process L B (Figure 3); • The representation of domain features D B (Figure 6). Table 1 presents the time series extracted from these projects. We extracted the set of states from the analyzed repositories after applying the method for decision support in project management (Section 3.3). The set of states is presented in Table 2. We took the value ranges for the team size indicator from the development guidelines [38]. We approximated the value ranges for the number of commits indicator to a time interval of 1 month based on the paper [39]. We selected empirically the ranges for the number of entities and business processes indicators. The invited expert proposed the following recommendations for the ng-tracker project based on the analysis of indicator states: An increase in the number of entities with a decrease in the number of implemented business processes indicates the lack of progress in the development of new project functionality. Developers should create more business methods.
The expert made this conclusion, since the 'number of entities' indicator was stable in the tabbychat project, while this indicator increased in the ng-tracker project. Moreover, in the ng-tracker project, there is a decrease in the number of implemented business methods.
In the future, we plan to add a [40] decision support module to the diagnostic analytics subsystem, which allows for the automatation of generating recommendations based on expert knowledge.

Discussion
The proposed approach to design automation and project management makes it possible to formalize various features of existing software projects. The knowledge base formed in analyzing existing projects makes it possible to search, extract, and analyze design and management solutions that can be used by designers to form the CS in the design process and by project managers in the initial states of the project.
In Section 2, we analyzed various works that described software design automation. In most cases, the authors of these studies proposed models and methods for representing experience and knowledge to organize corporate knowledge bases by formalizing design artifacts of various types [3,[9][10][11][12][13][14].
We also considered papers about the analysis of open-source software repositories to evaluate the quality of the repository, depending on various design, construction, and project management practices [16][17][18][19][20][21].
The main difference of the proposed approach from the existing ones is considering the various features of software projects: Considering the dynamics of the project allows us to evaluate the impact of management decisions on the quality of design artifacts and the development process. Moreover, information about the dynamics of the project development can be used in predictive analytics methods to predict the occurrence of specific events.
Such a multimodal representation of the project allows us to find hidden patterns and dependencies between the various features of the project. We can use formalized features of various projects as a data set for data mining and machine learning methods.
We presented in the 'Results' section of this article some use cases of using the proposed approach.
The Information retrieval of projects allows us to more accurately find projects for research. The current implementation allows to search for projects, considering the domain features. For example, such projects can be used to research data models and/or business processes for a new (unknown) domain. In addition, the proposed information retrieval method allows us to consider the size of the team and the size of the project. This search options allows us to search only training or demonstration projects or only large projects. In the future, we plan to add a search for projects by dependencies and architectural solutions.
The method for generating use case diagrams allows us to assess the functionality of the project when making management decisions or choosing a project for research. In the future, we plan to add support for generating the following structural UML diagrams: The method for the diagnostic analytics of a project allows us to extract and compare the development trends of the current project with other successful or unsuccessful projects. In the future, we plan to automate the process of project analysis, extracting the project development trend, and generating recommendations to support decision-making.
We do not fully use all views of the knowledge base. For example, the representation of the software project structure P B (Figure 4) can be used to find projects with a similar structural organization. Some researchers suggest that the way developers organized a project affects its success and code maintainability [41].
The disadvantages of the proposed approach are as follows: • The need for expertise to adapt the approach to different programming languages and technologies; • The need for expertise to consider the features of the development process; • The need to use the project-hosting API (GitHub, GitLab) to extract information about the development process: stages, tasks, merge requests, etc.
We plan to add the following features: • Support for fuzzy logic; • Generating new types of design artifacts; • The automatic generation of project management recommendations; • Data mining methods.

Conclusions
This article discusses an approach to the design automation and project management of software projects. Design automation improves the quality of software projects by considering successful and unsuccessful design decisions based on the experience of previous projects.
We proposed the knowledge base model to solve the problem of design automation. The knowledge base allows for the formalization of various design artifacts and various indicators of the development process. The generated knowledge base can also be used as a source of a set of time series.
We proposed the diagnostic analytics method to support the project development process. The proposed method is based on the analysis of multiple time series to form recommendations for improving the quality and efficiency of project management decisions.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: AD