An Approach to Analyze Vulnerability of Information Flow in Software Architecture

Current research on software vulnerability analysis mostly focus on source codes or executable programs. But these methods can only be applied after software is completely developed when source codes are available. This may lead to high costs and tremendous difficulties in software revision. On the other hand, as an important product of software design phase, architecture can depict not only the static structure of software, but also the information flow due to interaction of components. Architecture is crucial in determining the quality of software. As a result, by locating the architecture-level information flow that violates security policies, vulnerabilities can be found and fixed in the early phase of software development cycle when revision is easier with lower cost. In this paper, an approach for analyzing information flow vulnerability in software architecture is proposed. First, the concept of information flow vulnerability in software architecture is elaborated. Corresponding security policies are proposed. Then, a method for constructing service invocation diagrams based on graph theory is proposed, which can depict information flow in software architecture. Moreover, an algorithm for vulnerability determination is designed to locate architecture-level vulnerabilities. Finally, a case study is provided, which verifies the effectiveness and feasibility of the proposed methods.


Introduction
Due to the increasing complexity of software systems, designers tend to focus on the design and implementation of functional requirements. Non-functional requirements are often neglected, such as security [1]. So, vulnerabilities are inevitably introduced in design phase. In essence, software vulnerabilities are a special kind of defects. They are the manifestations of errors that violate security policies, which exist in every phase of the software life cycle, including requirement analysis, design, coding, testing and operation phases. Such vulnerabilities are the root causes of software security problems [2]. If these vulnerabilities cannot be discovered in time and reside in software until later phases of software development cycle, the normal operation of software systems will be faced with great threats. Software vulnerability analysis is the general term for techniques of locating vulnerabilities, and analyzing their production mechanism and action modes [3][4][5]. It is an important research area in cyber security.
However, although current approaches on software vulnerability analysis can be used for identifying software vulnerabilities, they are mostly targeted at source code and executable programs.
These methods can only be applied after the coding phase, when source code and executable programs are available. According to statistical data, about 50%-75% of defects are introduced during the design phase [6], and the repair cost will increase with the delay of discovery time of these defects. As a result, current methods on software vulnerability analysis targeted at source code and executable programs have the following shortcomings: (1) delay of time in discovering vulnerabilities; (2) difficulty fixing these vulnerabilities; (3) high cost in revision.
As the major product of the design phase, software architecture is composed of components and interactions between components [7]. Software architecture can be used to depict not only the static structure of the software, but also the information flow and propagation resulting from component interactions. It is a high-level abstraction of a software system [8]. The architecture defines functional, allocated, and product baselines of a system [9]. It can also be used to track current and future descriptions of a system composed of components and their interconnections, the actions or activities those components perform, and the rules or constraints for those activities [10]. Software architecture plays a crucial role in determining software quality [11].
Analyzing and evaluating software architecture during the design phase has been proved to be an effective way to find potential problems in the early stages of software life cycle, reduce costs and assure software quality [12]. Guided by this idea, some researchers have conducted research on architecture-level security analysis and design [13][14][15][16][17][18][19]. These methods can be used for verifying whether the design of software architecture has met the security requirements, or assuring the security of software architecture by designing security policies that constrain the components, connectors and configurations. However, none of these methods can be used for directly locating vulnerabilities in software architecture.
As a result, by conducting vulnerability analysis for software architectures that only focus on functional requirements but neglect non-functional requirements, and locating the information flow and propagation that violates security policies at design phase, can help find and fix these problems and prevent these vulnerabilities from residing until later phases. Locating the architecture-level information flow and propagation that violate security policies enables us to fix these vulnerabilities as early as possible, avoiding the possibility for them to reside until later phases of software development cycle. Some researchers have proposed that it is necessary to analyze software vulnerability on the architecture level [20,21], but what they propose is only the belief and necessity for architecture-level vulnerability analysis, lacking the description of the meaning of software architecture vulnerability as well as feasible or practical approaches.
As a result, this paper provides a graph-theory-based approach to analyze information flow vulnerabilities in software architecture. The key contribution of this paper is to provide an architecture-level vulnerability analysis method by examining the information flow between components. The proposed method can be used for directly locating architecture-level vulnerabilities.
In detail, first, the concept of information vulnerability in software architecture is proposed. This concept complements and improves current theory of software vulnerability in design phase. Second, based on Bell-LaPadula model and Biba model, the policies for determining information flow vulnerability in software architecture are proposed. The proposed policies are extensions of Bell-LaPadula model and Biba model to architecture level. These policies constrain the direction of information flow. Third, a graph-theory based modeling method is proposed. The constructed service invocation diagram model can be used to depict the information flow and propagation in architecture, with consideration of security levels of each node and operation types between nodes. Fourth, an algorithm for locating information flow vulnerabilities in software architecture is proposed.
The rest of the paper is organized as follows: In Section 2, related works are analyzed. The proposed method is elaborated in Section 3. Section 4 presents a case study. Conclusions and future work are given in Section 5.

Software Vulnerability
In this paper, we will research the definition of information flow vulnerability in software architecture. So, we will firstly review the research on definitions of software vulnerability.
Research on the definitions of software vulnerability can help researchers better understand the essence of software vulnerability. As a result, researchers have proposed various definitions and elaborations for software vulnerability from different perspectives.
Current definitions of software vulnerability can be classified into four categories: definitions based on access control, definitions based on state space, fuzzy definitions, and definitions based on security policies, as shown in Table 1. Table 1. Different categories of definitions on software vulnerability.

Categories of Definition Description
Access control [22] An access control vulnerability is whatever causes the operating system to perform operations that are in conflict with the security policy as defined by the access control matrix.
State space [23] A computer system is viewed as the composition of configuration states of entities. System operation is achieved through transitions between states. State transitions are classified as authorized and unauthorized. A vulnerable state refers to an authorized state that can transitions into an unauthorized state through authorized transition.
Fuzzy [24] Software vulnerabilities are weak points that can be exploited by threats and cause potential losses.
Security policies [2] A software vulnerability is an instance of an error in the specification, development, or configuration of software such that its execution can violate the security policy.
Among the four kinds of definitions of software vulnerability, fuzzy definition is widely used in books, dictionaries, standards and cyber security literature. But this definition is too general and abstract. It lacks the clear elaboration of the essence of software vulnerability.
Compared with fuzzy definition, the definitions based on access control and state space are clearer. But access control definition can be only used for describing the vulnerabilities that violate access control models. State space definition on the other hand is targeted at only UNIX and Windows NT, not suitable for other software.
By comparison, the definition of software vulnerability based on security policies clearly states that software vulnerabilities exist in every phase of software development cycle, and describes the relationship between software vulnerabilities and security policies that specifies what is illegal, what can happen, and what is allowed in a software system. It emphasizes that in essence software vulnerabilities are errors that violate security policies. However, this definition still does not associate with specific features of different phases in software development lifecycle. Thus, in this paper, on the basis of the definition of software vulnerability based on security policies, and with a focus on the information flow and propagation on architecture level, we will research into the definition of information flow vulnerability in software architecture.

Vulnerability Analysis on Architecture Level
Currently, research on architecture-level vulnerability analysis is still in starting phase [20]. As comprehensively as possible, we collected three papers related to this.
In [25], the authors proposed a method and a corresponding tool called SecArch to evaluate security-critical architectures. SecArch is an incremental evaluation tool for secure architectures. It employs multiple architecture diagrams and scenarios to locate unanticipated interaction patterns in the architecture, to evaluate its security. However, since they did not clarify the meaning of software architecture vulnerability, they did not clearly state in the conclusion that it is in the modules containing abnormal interaction patterns that vulnerabilities exist. In comparison with their work, we focus on the information flow and propagation resulting from interactions between components. The models we employ are not existing architecture diagrams, but a graph-theory-based model that can explicitly depict the information flow in software architecture. Moreover, we will clarify the definition of information flow vulnerability in software architecture, and propose policies for determining information flow vulnerabilities in software architecture.
Another study [26] utilized the features of software architecture to find backdoors of the software. They analyzed the relationship between architecture, dynamic behavior, and security vulnerabilities to identify potential backdoors in architecture. Their main idea was to compare the actual architecture that was obtained after the system is completely developed, with the architecture in the original version and discovered vulnerabilities by injecting malicious code. Hence, their method cannot be applied at design phase to locate architectural vulnerabilities. By contrast, the method proposed in this paper is targeted at locating architectural vulnerabilities at design phase. Comparison between multiple versions of software architectures is not needed.
In [27], the authors proposed the concept of software vulnerability degree. In their definition, the less stable the operation is, the more vulnerable the software is. They built mathematical models of software structural vulnerability, constructed a transition equation set to describe the operation process of software elements, and evaluated the vulnerability of software structure by use of the metric of defect density in source code. Also, they give a control method for software vulnerability degree. However, they did not propose the method for locating structural vulnerabilities. By comparison, our definition for software vulnerability is different from theirs. The object we employ in case study is not source code, but an architecture. Most importantly, we propose a method for locating architectural vulnerabilities.

Static Vulnerability Analysis of Source Code and Executable Programs
Because vulnerability analysis on architecture level belongs to the research domain of software vulnerability analysis, and the literature of architecture-level vulnerability analysis is scarce, we extend the literature view to software vulnerability analysis. As mentioned in the Introduction section, current research on software vulnerability analysis is mostly targeted at source code and executable programs and the methods they utilize are static analysis and dynamic testing. Since the focus of this paper is architecture-level vulnerability, it is difficult to directly execute a software architecture [28]. As a result, in this subsection, we will review the research on static vulnerability analysis of source code and executable programs. By reviewing related work on static vulnerability analysis, we hope to borrow some ideas for our research.
Static analysis refers to analyzing source code or binary code without running the program [5]. Static analysis approaches include data flow analysis, symbolic execution, theorem proving, and so on. Dataflow analysis is the most widely used static analysis method. This method examines the definitions and values of variables in a program to analyze potential vulnerabilities [29,30]. When conducting dataflow analysis, source code is firstly abstracted into a syntax tree or control flow diagram. Then algebraic methods are used to calculate the use of variables, with which the runtime behavior of a program can be depicted. Finally, corresponding rules are used to find vulnerabilities in source code [31]. Symbolic execution [32][33][34] models source code into control flow diagrams or function invocation graphs. Then certain symbols are used to represent the values of variables to simulate program execution. Finally, vulnerabilities in source code are discovered by the use of vulnerability examination rules. Although symbolic execution is an effective approach to finding vulnerabilities in source code, in actual executions state explosion may occur. Besides, the analysis results rely heavily on the problem-solving capability of automatic tools [4]. Theorem proving [35,36] transforms source code into logical equations and theorems or rules are employed to demonstrate whether the program is a valid theorem in order to find the vulnerabilities in the source code. Compared with other static analysis methods, this method is more accurate, but needs more human intervention. As a result, it is not suitable for complex software.

of 30
By analyzing related research on static vulnerability analysis and the work in [5], we can summarize the basic steps of static vulnerability analysis as follows: Step 1: model construction. In this step, source code or an executable program is modelling into a model that can be used for further analysis.
Step 2: pattern extraction. In this step, typical vulnerability patterns are extracted based on historical vulnerabilities.
Step 3: pattern matching. In this step, the model from step 1 and vulnerability patterns from step 2 are matched. Corresponding analysis methods are selected to obtain vulnerability information from the source code or executable programs.
The method proposed in this paper will be based on the above solution from static vulnerability analysis, including model construction, pattern extraction and pattern matching. However, by comparison, the modeling target of this paper is not source code, but software architecture. As for pattern extraction, there is no architecture-level vulnerability database for us to conduct pattern extraction. So, we will borrow ideas from information flow control theory. In later subsections, we will review studies on architecture analysis methods based on graph theory, and information control theory.

Architecture Analysis Based on Graph Theory
Graphs are a fundamental construct in complex systems research, and the use of graph theoretic algorithms and metrics to extract useful information from a problem is a primary method of analysis for complex systems [37]. A graph-based representation is used to provide a particular perspective on a problem. Generally, graph-theoretic representations of a problem emphasize aspects that are either structural such the connection of components or processes in the problem, or that depend on structural properties such as the robustness of a network of components. Graph representations are particularly useful in understanding systems involving multiple interacting entities. Such complex systems lend themselves to a graph-theoretic representation and the rich body of network flow methods that can represent and analyze the component structures and patterns of interactions at both local and global levels [38,39].
Architecture analysis methods based on graph theory have been widely adopted by researchers form systems engineering and software engineering area. In systems engineering area, graph theory provides a structural perspective for systems, with a focus on structural properties such as robustness, resilience, and security [40][41][42]. In software engineering area, researchers usually utilize methods based on graph theory to model the interactions between web services or classes [43][44][45][46][47]. Graph-based modeling can be classified into two categories: The first category focuses on topological structure, modeling the static interdependent structure of components, such as classes and services. The second category focuses on the dynamic interaction, modeling the data flow, interactions between services, invocation relationship between classes in software architecture. To improve the modeling capability of these graph-based models, researchers usually extend the nodes and edges in the graph to satisfy the needs of their respective research. This paper will be based on the second category of graph-based models.

Information Flow Control
Information flow is the information propagation and flow inside a system and across systems [48]. Information flow control is a significant means in cyber security area to ensure end-to-end information security. Information flow control employs security policies or security models to control the flow of information [49]. In this paper, we will use the thinking of information flow control to analyze the information flow vulnerabilities in software architecture.
First of all, we will clarify the concepts and terms used in information flow security, including security goals, security policies, security models, and lattice model of information flow.
Security goals of information flow include confidentiality goals and integrity goals. Confidentiality is the property that information is not made available or disclosed to unauthorized individuals, entities, Appl. Sci. 2020, 10, 393 6 of 30 or processes. Integrity is the property that information cannot be modified in an unauthorized or undetected manner, and the accuracy and completeness of information is maintained and assured over its entire lifecycle.
Security policies of information flow are a set of policies designed for a group of security goals, which define what is allowed and what is prohibited. Similar to security goals, security policies include confidentiality policies and integrity policies. Confidentiality policies prevent information from flowing to entities that are not authorized to access the information. Integrity policies prevent information from flowing to entities with higher integrity [50].
Security policies are usually represented by security models. Security models are representations of security policies by use of formal or mathematical terminology. John Mclean made a detailed summary of security models in [51]. In this paper, we will make use of two important information flow security models, which are Bell-LaPadula model and Biba model. The Bell-LaPadula model is suitable for military security policies. It ensures system confidentiality by preventing the diffusion of unauthorized information [52]. The basic principle of the Bell-LaPadula model is that a subject with a higher security level is not allowed to write an object with a lower security level, and a subject with a lower security level is not allowed to read an object with a higher security level, which can be characterized as "no read up, no write down." The Biba model [53] is used to protect integrity. Its basic principle is that a subject with a higher integrity level can write an object with a lower integrity level, and a subject with a lower integrity level can read an object with a higher integrity level, which can be characterized as "no write up, no read down".
In various formal or mathematical descriptions of information flow security models, the most widely used model is the lattice model of information flow proposed by Denning. The information lattice model is defined as follows [48]: If (L,≤) is a semi-order set and there is a maximum lower bound and a minimum upper bound for every element (a,b) in L, then the tuple (L,≤) is lattice.
This model is used for classifying information into different security groups. Each group of information forms a security class. Information is only allowed to flow between specified security classes. According to confidentiality policies, information is only allowed to flow inside a security class (class A) or into a security class (class B) with a higher security level. According to integrity policies, information is only allowed to flow inside a security class or into a security class with a lower security level.
In later sections, we will use the lattice model to depict the information flow security in software architecture.

Assumptions and Framework of the Proposed Method
Assumption 1. The understanding of software architecture in this paper is based on [7]. Software architecture is composed of components and interactions between components. In modelling software architecture, we borrow the concepts from service-oriented architecture (SOA), which is widely used in both software engineering and systems engineering. In SOA, each service is an independent function unit composed of a single component or several components [54]. So, the software architecture in this paper is viewed as a composition of services and interactions between services.

Assumption 2.
As mentioned in the Introduction section, designers tend to focus on the design and implementation of functional requirements, while non-functional requirements tend to be neglected [1]. Hence, the second assumption is that the architectures that this paper focuses on are designed without considering non-functional requirements, especially security requirements. So, these architectures only reflect functional-level information flow and propagation. Assumption 3. The types of service invocation operation are limited to reading and writing. While in software engineering, there are three types of operations-Reading, writing, and execution. The execution operation is not considered in this paper, because execution operations occur in source code level or executable program level [55]. Execution operation is not suitable for architecture level.
In this paper, we first propose a definition and formal description of information flow vulnerability in software architecture. Then, security policies of information flow in architecture are given based on Bell-LaPadula model and Biba model. After that, the construction method for service invocation diagram of software architecture is proposed based on graph theory. Finally, we designed and implemented a corresponding algorithm for locating vulnerabilities. The framework of the research method is shown in Figure 1, where green blocks represent background and domain knowledge needed in this research, rounded rectangles represent research that needed to be conducted, blue rectangles represent the results and outputs of each part of the research, gray rectangles represent the analysis results.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 30 given based on Bell-LaPadula model and Biba model. After that, the construction method for service invocation diagram of software architecture is proposed based on graph theory. Finally, we designed and implemented a corresponding algorithm for locating vulnerabilities. The framework of the research method is shown in Figure 1, where green blocks represent background and domain knowledge needed in this research, rounded rectangles represent research that needed to be conducted, blue rectangles represent the results and outputs of each part of the research, gray rectangles represent the analysis results.

Vulnerability of Information Flow in Software Architecture
In this subsection, we will propose the concept of vulnerability of information flow in software architecture and corresponding formal description. As mentioned above, software architecture can be used for depicting the information flow and propagation resulting from interactions between components. As a result, before giving the definition of information flow vulnerability in software architecture, we will firstly clarify two concepts, including what is software architecture vulnerability, what is the information flow in architecture. Definition 1. Software architecture vulnerability. Software architecture vulnerability is the errors resulting from violations of corresponding security policies by components and interaction between components that forms the software architecture.
Based on this definition, we can further explain and understand the essence of software architecture vulnerability from four aspects. First, the essence of vulnerability is a kind of design errors. Second, the core of software architecture vulnerability is the violation of security policies by architecture design. Third, this definition can help architecture designers and analyzers identify which part of the architecture needs to be protected by an architecture-level security mechanism [2]. Fourth, this definition is a generalization of all kinds of software architecture vulnerabilities. It can be further divided into subcategories, which will not be covered in this paper.
To better differentiate our work from other current work on software vulnerability, we compare

Vulnerability of Information Flow in Software Architecture
In this subsection, we will propose the concept of vulnerability of information flow in software architecture and corresponding formal description. As mentioned above, software architecture can be used for depicting the information flow and propagation resulting from interactions between components. As a result, before giving the definition of information flow vulnerability in software architecture, we will firstly clarify two concepts, including what is software architecture vulnerability, what is the information flow in architecture. Definition 1. Software architecture vulnerability. Software architecture vulnerability is the errors resulting from violations of corresponding security policies by components and interaction between components that forms the software architecture.
Based on this definition, we can further explain and understand the essence of software architecture vulnerability from four aspects. First, the essence of vulnerability is a kind of design errors. Second, the core of software architecture vulnerability is the violation of security policies by architecture design. Third, this definition can help architecture designers and analyzers identify which part of the architecture needs to be protected by an architecture-level security mechanism [2]. Fourth, this definition is a generalization of all kinds of software architecture vulnerabilities. It can be further divided into subcategories, which will not be covered in this paper.
To better differentiate our work from other current work on software vulnerability, we compare software architecture vulnerability with traditional code-level vulnerability, shown in Table 2. The most significant differences between software architecture and code-level vulnerability are as follows. First, they are of different abstraction granularities. Architecture-level vulnerabilities are at a higher level of abstraction granularity. Besides, objects to be analyzed in architecture-level and code-level vulnerability are different. In architecture-level vulnerability, the software architecture is analyzed. Finally, they have different focuses. Architecture vulnerability analysis focuses on the structural level, such as components, interactions between components and the topology resulting from these interactions while code-level vulnerability analysis focuses on lexical, syntactic and semantic aspects.

Definition 2.
Information flow in architecture. This is the flow and propagation of information of the components, interactions between components and the topology that forms from their interaction. The information in this definition is the same as data.
Based on the understanding of information flow in architecture above, vulnerability of information flow in software architecture is defined as follows: Definition 3. Vulnerability of information flow in software architecture. Vulnerability of information flow in software architecture is errors resulting from the violation of corresponding security policies during the flow and propagation of information in components, interactions between components and the topology formed from those interactions.
Normally, the flow and propagation of information should conform to confidentiality and integrity policies [49]. Hence, the manifestation of information flow vulnerability in software architecture can be categorized into vulnerability under confidentiality and integrity policies. If there exists a ∈ A, b ∈ B, and a → b, then a → b is secure; If there exists a ∈ B, b ∈ A, and a → b, then {a → b} ∈ Vsa.
Definition 5. Vulnerability of information flow in software architecture under integrity policies Information carried by the architecture is only being allowed to flow inside services of the same security level or from services of high security level to those of low security level. All services and interactions between services violating this principle are vulnerabilities of information flow in the architecture under integrity policy. This definition can be described based on lattice as follows: For software architecture SA, there are integrity policies requirement. When {L, ≥} and A, B ∈ L, if there exists data passing from A to B, then A ≥ B holds; If there exists a ∈ A, b ∈ B, and a → b, then a → b is secure; If there exists a ∈ B, b ∈ A, and a → b, then {a → b} ∈ Vsa In the description above, SA represents the software architecture set, L represents the set of secure classes, ≤ denotes the relationship between low and high secure classes, A and B are elements in L, a represent a service, a ∈ A means a is of security level A. → denotes the direction of information flow between services, and V sa represents software architecture vulnerability.
To explain Definitions 4 and 5, Figures 2 and 3 are presented. In Figure 2, we can see that the architecture is composed of service 1 to service 5. The rectangular boxes represent services. The color represents the security level of each service, in which blue represents low security level while yellow represents high security level. The boxes with the same color represent services with the same security level. The symbol "→" represents the interaction between services and the direction of data flow. In Figure 2, the part inside the red box is the vulnerability of information flow in software architecture under confidentiality policies. If there exists a ∈ A, b ∈ B, and a → b, then a → b is secure; If there exists a ∈ B, b ∈ A, and a → b, then {}  a b Vsa In the description above, SA represents the software architecture set, L represents the set of secure classes, ≤ denotes the relationship between low and high secure classes, A and B are elements in L, a represent a service, a ∈ A means a is of security level A. → denotes the direction of information flow between services, and Vsa represents software architecture vulnerability.
To explain Definitions 4 and 5, Figures 2 and 3 are presented. In Figure 2, we can see that the architecture is composed of service 1 to service 5. The rectangular boxes represent services. The color represents the security level of each service, in which blue represents low security level while yellow represents high security level. The boxes with the same color represent services with the same security level. The symbol "→" represents the interaction between services and the direction of data flow. In Figure 2, the part inside the red box is the vulnerability of information flow in software architecture under confidentiality policies. Similarly, in Figure 3, we can see that the architecture is composed of service 1 to service 5. The rectangular boxes represent services. The color represents the security level of each service, in which pink represents high security level while green represents low security level. The boxes with the same color represent services with the same security level. The symbol "→" represents the interaction between services and the direction of data flow. In Figure 3, the part inside the red box is the vulnerability of information flow in software architecture under integrity policies.  If there exists a ∈ A, b ∈ B, and a → b, then a → b is secure; If there exists a ∈ B, b ∈ A, and a → b, then {}  a b Vsa In the description above, SA represents the software architecture set, L represents the set of secure classes, ≤ denotes the relationship between low and high secure classes, A and B are elements in L, a represent a service, a ∈ A means a is of security level A. → denotes the direction of information flow between services, and Vsa represents software architecture vulnerability.
To explain Definitions 4 and 5, Figures 2 and 3 are presented. In Figure 2, we can see that the architecture is composed of service 1 to service 5. The rectangular boxes represent services. The color represents the security level of each service, in which blue represents low security level while yellow represents high security level. The boxes with the same color represent services with the same security level. The symbol "→" represents the interaction between services and the direction of data flow. In Figure 2, the part inside the red box is the vulnerability of information flow in software architecture under confidentiality policies. Similarly, in Figure 3, we can see that the architecture is composed of service 1 to service 5. The rectangular boxes represent services. The color represents the security level of each service, in which pink represents high security level while green represents low security level. The boxes with the same color represent services with the same security level. The symbol "→" represents the interaction between services and the direction of data flow. In Figure 3, the part inside the red box is the vulnerability of information flow in software architecture under integrity policies.  Similarly, in Figure 3, we can see that the architecture is composed of service 1 to service 5. The rectangular boxes represent services. The color represents the security level of each service, in which pink represents high security level while green represents low security level. The boxes with the same color represent services with the same security level. The symbol "→" represents the interaction between services and the direction of data flow. In Figure 3, the part inside the red box is the vulnerability of information flow in software architecture under integrity policies.

Security Policies of Information Flow in Architecture
In this subsection, we propose security policies of information flow in software architecture based on Bell-LaPadula model and Biba model. These policies specify the direction of information flow in three operations that occur in service interactions, including reading operation, writing operation as well as reading and writing operation.

Security Policies of Information Flow under the Goals of Confidentiality
As mentioned above, the confidentiality policy of information flow is intended to prevent information from flowing to subjects that are not authorized to access the information, that is, information is only allowed to flow inside a security class or from a class with low security level to a class with higher security level. We propose security polices as follows:

Policy 1. Policy for reading operation in service invocation under the goal of confidentiality.
If there is an invoking service A and an invoked service B, A can perform the reading operation on B if and only if C a ≥ C b . The information flow is B → A. C x stands for the confidentiality level of service x.

Policy 2. Policy for writing operation in service invocation under the goals of confidentiality.
If there is an invoking service A and an invoked service B, A can perform the writing operation on B if and only if C a ≤ C b . The information flow is A → B. If there is an invoking service A and an invoked service B, A can perform both reading and writing operation on B if and only if C a = C b . The information flow is A B.

Security Policies of Information Flow under the Goals of Integrity
An integrity policy of information flow is intended to prevent information from flowing to subjects or data with a higher integrity level and ensures that information only flows inside a security class or to a class with lower security level. Integrity refers to the feature that information cannot be destroyed, lost, or modified without authorization when it is transmitted, exchanged, stored, or processed. There are two kinds of modification, direct and indirect. Direct modification means service A performs the writing operation on service B, and indirect modification means that service A performs the reading operations on service B. Aiming at reading, writing, as well as reading and writing operations that occur during service invocation, we propose security polices as follows: If there is an invoking service A and an invoked service B, A can perform both reading and writing operations on B if and only if I a = I b . The information flow is A B.

Construction Approach of Service Invocation Diagram Based on Software Architecture
From Section 3.3, we can see that two key points of security policies of information flow in architecture are the direction of information flow during service invocation and the security class of each service. As a result, the model constructed by the proposed approach should have corresponding descriptive ability. That is, the model should be able to reflect the direction and path of information flow in the architecture as well as the confidentiality and integrity level of the information carried by services. Software architecture design specification contains some diagrams that can reflect information interaction, such as data flow diagrams, flow charts, activity diagrams, and state diagrams, but these diagrams cannot be used for analysis directly. This is because the uncertainty of the future operation environment of the system, the complexity of the software, the cognitive limitation of the designer and service reuse in current service-oriented architectures make it impossible for all potential information flow and propagation to be included in these diagrams. To further explain this problem, we give an example, shown in Figure 4.

Construction Approach of Service Invocation Diagram Based on Software Architecture
From Section 3.3, we can see that two key points of security policies of information flow in architecture are the direction of information flow during service invocation and the security class of each service. As a result, the model constructed by the proposed approach should have corresponding descriptive ability. That is, the model should be able to reflect the direction and path of information flow in the architecture as well as the confidentiality and integrity level of the information carried by services. Software architecture design specification contains some diagrams that can reflect information interaction, such as data flow diagrams, flow charts, activity diagrams, and state diagrams, but these diagrams cannot be used for analysis directly. This is because the uncertainty of the future operation environment of the system, the complexity of the software, the cognitive limitation of the designer and service reuse in current service-oriented architectures make it impossible for all potential information flow and propagation to be included in these diagrams. To further explain this problem, we give an example, shown in  In the architecture design specification of a piece of software, there is a statement that when performing a certain function, a calculation service X is supposed to pass the calculated value to another service Y. According to this description, a designer designs the information flow path as shown in Figure 4a. However, to achieve this in actual operation of the system, before passing the value to calculation service Y, calculation service X invokes storage service Z. Then Z invokes calculation service z and passes the value to y. From this process, we can see clearly that an implicit information flow X → Z → Y, which does not exist in the design phase, emerges, as shown in Figure  4b.
So, it is not enough to use architecture design specification and architecture diagrams to analyze the vulnerability of information flow in software architecture. It is necessary to construct a model that can capable of depict all potential information flow and propagation based on architecture. In the 18th century, Euler, a renowned mathematician, solved the famous problem of the seven bridges of Konigsberg by abstracting paths into a graph and hence set up graph theory. As a data structure, a graph can represent complex relationships between data elements that cannot be determined intuitively. In this paper, we model the information flow and propagation in architecture by use of graph theory.

Determination and Representation of Service Invocation Elements
There are many relationships between and among services, such as invocation, containment, association, aggregation, cascade, and replacement. In these relationships, invocation is the most capable of reflecting the direction and propagation of the carried information. Before determining the In the architecture design specification of a piece of software, there is a statement that when performing a certain function, a calculation service X is supposed to pass the calculated value to another service Y. According to this description, a designer designs the information flow path as shown in Figure 4a. However, to achieve this in actual operation of the system, before passing the value to calculation service Y, calculation service X invokes storage service Z. Then Z invokes calculation service z and passes the value to y. From this process, we can see clearly that an implicit information flow X → Z → Y, which does not exist in the design phase, emerges, as shown in Figure 4b.
So, it is not enough to use architecture design specification and architecture diagrams to analyze the vulnerability of information flow in software architecture. It is necessary to construct a model that can capable of depict all potential information flow and propagation based on architecture. In the 18th century, Euler, a renowned mathematician, solved the famous problem of the seven bridges of Konigsberg by abstracting paths into a graph and hence set up graph theory. As a data structure, a graph can represent complex relationships between data elements that cannot be determined intuitively. In this paper, we model the information flow and propagation in architecture by use of graph theory.

Determination and Representation of Service Invocation Elements
There are many relationships between and among services, such as invocation, containment, association, aggregation, cascade, and replacement. In these relationships, invocation is the most capable of reflecting the direction and propagation of the carried information. Before determining the service invocation relationship, we first propose a definition for service invocation and a series of related concepts as the basis for describing the invocation relationship between services. Definition 6. Service invocation. Service invocation refers to the process of information carried in one service is passed on to another service.

Definition 7.
Set of service invocations. The set of service invocations is a set consisting of service invocation elements. It can be denoted by a quadruple I. I = {S,R,A,D}, in which S stands for the pair of service involved in the invocation, R represents the invocation relationship, A is the operation sequence of service invocation, and D is the direction of data flow.
The definitions R, A, and D will be given in definitions 8-10. Before defining the relationship of service invocation R, forms of service invocation should be clarified, including the types of service invocation and forms of service interaction. In serviced-oriented software architecture, there are four types of service invocation: sequence structure, selection structure, parallel structure, and iteration structure [56], as shown in Figure 4. In data flow analysis, it is common to ignore the conditions of the paths, i.e., to assume that all paths are feasible [57]. As a result, based on the four types of service invocation, we can abstract three types of service interaction: one to one, one to many, and many to many. This is also consistent with the interaction relationship between common data elements [58]. Table 3 shows these three types of service interaction.

Data interaction relationship Service interaction type
One to one S 1 S 2 One to many The service invocation process includes not only interaction types, but also operation sequence of service invocation A and direction of data flow D. Next, the operation sequence of service invocation A and the direction of data flow between services resulting from service invocation process will be defined.

Definition 9.
Operation sequence of service invocation. Operation sequence of service invocation refers to the set of operations during service invocation. It can be denoted by a tuple, A = {r, w}, in which r represents reading operation and w represents writing operation.
Definition 10. Direction of data flow between services. Direction of data flow between services refers to the direction of data flow when reading and writing operation occur during service invocation, which can be denoted by the symbol "→".
For example, if there exist data Di ∈ Ss, Dj ∈ So, and a set of service invocation I = {Ss,So,r}, then One to many Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 30 Table 3. Types of service interaction.

Data interaction relationship Service interaction type
One to one S 1 S 2 One to many The service invocation process includes not only interaction types, but also operation sequence of service invocation A and direction of data flow D. Next, the operation sequence of service invocation A and the direction of data flow between services resulting from service invocation process will be defined.

Definition 9.
Operation sequence of service invocation. Operation sequence of service invocation refers to the set of operations during service invocation. It can be denoted by a tuple, A = {r, w}, in which r represents reading operation and w represents writing operation. Definition 10. Direction of data flow between services. Direction of data flow between services refers to the direction of data flow when reading and writing operation occur during service invocation, which can be denoted by the symbol "→".
For example, if there exist data Di ∈ Ss, Dj ∈ So, and a set of service invocation I = {Ss,So,r}, then Many to many Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 30 Table 3. Types of service interaction.

Data interaction relationship Service interaction type
One to one S 1 S 2 One to many The service invocation process includes not only interaction types, but also operation sequence of service invocation A and direction of data flow D. Next, the operation sequence of service invocation A and the direction of data flow between services resulting from service invocation process will be defined.

Definition 9.
Operation sequence of service invocation. Operation sequence of service invocation refers to the set of operations during service invocation. It can be denoted by a tuple, A = {r, w}, in which r represents reading operation and w represents writing operation. Definition 10. Direction of data flow between services. Direction of data flow between services refers to the direction of data flow when reading and writing operation occur during service invocation, which can be denoted by the symbol "→". Definition 8. The relationship of service invocation. The relationship of service invocation refers to the interaction form when service invocation occurs, which can be denoted by a tuple, R = {S s ,S o }. Ss stands for the invoking service, which is the service that invokes another service, and S o stands for the invoked service, which is the service that is invoked. This tuple can be denoted as S so . For example, if service i invokes service j, then the relationship of this invocation can be denoted as S ij . Figure 5 and Table 3 provide examples to show the relationships of the four types of service invocation. For sequence structure, the corresponding type of service interaction is one to one, and the resulting relationship is S 12 and S 23 . . . , S n-1 S n . For parallel structure, the corresponding types of service interactions are one to many and many to one, and the resulting relationship are S 1 S 2 , S 2 S 5 AND S 1 S 3 , S 3 S 5 AND S 1 S 4 , S 4 S 5 . For selection structure, the corresponding types of service interaction is one to many and many to one, and the resulting relationship is S 1 S 2 , S 2 S 5 OR S 1 S 3 , S 3 S 5 OR S 1 S 4 , S 4 S 5 . For iteration structure, the corresponding type of service interaction is one to one, and the resulting relationship is S 1 S 2 and S 2 S 1 . Definition 8. The relationship of service invocation. The relationship of service invocation refers to the interaction form when service invocation occurs, which can be denoted by a tuple, R = {Ss,So}. Ss stands for the invoking service, which is the service that invokes another service, and So stands for the invoked service, which is the service that is invoked. This tuple can be denoted as Sso. For example, if service i invokes service j, then the relationship of this invocation can be denoted as Sij. Figure 5 and Table 3 provide examples to show the relationships of the four types of service invocation. For sequence structure, the corresponding type of service interaction is one to one, and the resulting relationship is S12 and S23…, Sn-1Sn. For parallel structure, the corresponding types of service interactions are one to many and many to one, and the resulting relationship are S1S2, S2S5 AND S1S3, S3S5 AND S1S4, S4S5. For selection structure, the corresponding types of service interaction is one to many and many to one, and the resulting relationship is S1S2, S2S5 OR S1S3, S3S5 OR S1S4, S4S5. For iteration structure, the corresponding type of service interaction is one to one, and the resulting relationship is S1S2 and S2S1.    The service invocation process includes not only interaction types, but also operation sequence of service invocation A and direction of data flow D. Next, the operation sequence of service invocation A and the direction of data flow between services resulting from service invocation process will be defined.

Definition 9.
Operation sequence of service invocation. Operation sequence of service invocation refers to the set of operations during service invocation. It can be denoted by a tuple, A = {r, w}, in which r represents reading operation and w represents writing operation.

Definition 10.
Direction of data flow between services. Direction of data flow between services refers to the direction of data flow when reading and writing operation occur during service invocation, which can be denoted by the symbol "→".
For example, if there exist data D i ∈ S s , D j ∈ S o , and a set of service invocation I = {S s ,S o ,r}, then this means invoking service Ss performs the reading operation on invoked service So. The direction of data flow can be represented by S o → Ss, which means that when an invoking service performs the reading operation on an invoked service, the data flow from the invoked service to the invoking service. Similarly, if there exists data D i ∈ S s , D j ∈ S o , and a set of service invocation I = {S s ,S o ,w}, this means invoking service Ss performs the writing operation on invoked service So. The direction of data flow can be represented by S s → So, which means that when an invoking service performs the writing operation on an invoked service, the data flow from the invoking service into the invoked service. In addition to performing the reading and writing operations separately, an operation sequence of service invocation can perform both operations concurrently. In this case, the direction of data flow can be explained as follows: If there exists data D i ∈ S s , D j ∈ S o and a set of service invocation I = {S s ,S o ,r and w}, it means the invoking service Ss performs both the reading and writing operations on the invoked service S o . The direction of data flow can be represented by S o → S s and S s → S o , which means that when an invoking service performs the reading and writing operations concurrently on an invoked service, data flows from the invoked service to the invoking service and from the invoking service into the invoked service at the same time.

Definition and Construction Approach for Service Invocation Diagram
From the research above, the information about service invocation can be obtained. Then by using graph theory, a service invocation diagram can be constructed, which can be used as basis for later vulnerability analysis of information flow in architecture.
To construct a service invocation diagram, we propose mapping rules between service invocation information obtained above and elements in the diagram. First, services are mapped into vertices in the diagram. Second, the invocation relationship is mapped into the edges. Third, the operation type of service invocation is mapped as weights on the edges. The weight value mapping from the reading operation is 1, and the weight value from the writing operation is 2. Moreover, the direction of an edge is based on the direction of data flow between services. As a result, the service invocation diagram constructed in this paper is a weighted and directed graph. Definition 11. Service invocation diagram. Service invocation diagram is a data structure used to describe invocation information between services. It consists of a nonempty set of vertices V (composed of n vertices, n > 0) and a set of edges E (relations between vertices), which can be denoted as After learning the service invocation information and defining the service invocation diagram, we can describe the service invocation diagram by use of an adjacency matrix.

Definition 12.
Adjacency matrix of service invocation diagram. For a service invocation diagram G, a one-dimensional array with n elements L [0, n−1] can be given to describe the information of vertices. Based on this, a two-dimensional array A [0, n−1] [0, n−1] can be given, which is the adjacency matrix for this service invocation diagram. In matrix A, the element A ij stores the information relation between vertices i and j. The matrix can be denoted as When there exists a relationship between service i and j, then there exists an edge between vertices i and j, the weigh on the edge is w ij . As mentioned above, w ij = 1 or 2.
When there is no relationship between services i and service j, the value of w ij is 0. Finally, by traversing all invocation relations between services, the adjacency matrix and service invocation diagram can be obtained.

Approach for Determining Confidentiality and Integrity Levels of Services
After constructing the service invocation diagram, we need to assign values for the confidentiality and integrity levels of services. The core element for determining the security level of a service is to determine the security level of data carried on that service, that is, to assign values for the confidentiality level and integrity level for that data information. When assigning values, we need to comprehensively consider the confidentiality and integrity levels of the data as well as the security level of services. In this paper we adopt a method for assigning values and determining levels for assets from the security area that are widely accepted [59][60][61] and considered practical. The equations for determining the confidentiality level and integrity level for services are as follows.
Equation (1) determines confidentiality level, in which SER cl represents the confidentiality level of a service, D cl represents the confidentiality level of data carried on that service, which is determined by the criticality level of data confidentiality, Sec l represents the security level of that service, and Round1 means round up to one decimal. It is worth noting that + 1 in the equation means the confidentiality of the service is higher than that of the data. To simplify, we take 1 for value 1.
Equation (2) determines the integrity level for services, in which SER il represents the integrity level of a service, D il represents the integrity level of data carried on that service, which is determined by the criticality level of data integrity, Sec l represents the security level of the service, Round1 means rounding up to one decimal.
The values of D cl and D il in Equations (1) and (2) are determined by experts according to Tables 4 and 5. Table 4. Criticality level of data confidentiality.

Critical
Compromising of confidentiality will cause critical losses for system or software 3 Medium Compromising of confidentiality will cause medium losses for system or software 2 Minor Not very critical; compromising of confidentiality will cause minor losses for system or software 1 Low Not crucial; compromising of confidentiality will cause only minimum losses for system or software Table 5. Criticality level of data integrity.

Critical
Compromising of integrity will cause critical losses for system or software 3 Medium Compromising of integrity will cause medium losses for system or software 2 Minor Not very critical; compromising of integrity will cause minor losses for system or software 1 Low Not crucial; compromising of integrity will cause only minimum losses for system or software As for security level of services in Equations (1) and (2), when the security of the service is compromised, the software system, the user, and the organization to which the service belongs will be impacted. Different entities will suffer from the impacts of different severity levels. In this paper, we define three severity levels: medium, critical, and extreme. As a result, the security levels impacted by objects and severity levels together can be classified into five level, {1,2,3,4,5}. Level 1 means there is almost no impact; level 2 means there are minor impacts; level 3 means there are some medium impacts; level 4 means there are some significant impacts; level 5 means there are critical impacts. Referring to [62], we propose a determination matrix for service security levels, as shown in Table 6. According to this matrix, combined with the characteristics of the software system, and under the guidance of expert experience, the security level of each service can be obtained.
Finally, through Equations (1) and (2), the confidentiality and integrity levels of services can be obtained and thus a service invocation diagram based on architecture can be constructed. In the rest of the paper, we will use this model to locate vulnerabilities of information flow in software architecture. Organization to which software system belongs Level 3 Level 4 Level 5

Algorithm for Detemining Vulnerability of Information Flow in Architecture
There are two major tasks in locating vulnerabilities based on the service invocation diagram: First, find all paths of information flow and propagation. Second, based on the confidentiality or integrity level of each vertex on the paths as well as the direction of information flow between adjacent vertices, locate vulnerabilities by looking for vertices that violate the security policies of information flow proposed above. In this paper, two algorithms are designed to identify and locate vulnerabilities in the service invocation diagram based on confidentiality and integrity policies. Key points of the algorithms are as follows: • Use all vertices with an in degree of 0 as initial vertices, and sort them.

•
Find paths that begin with these vertices with an in degree of 0 one by one. Stop until no vertices interact with these vertices. Then output these paths. Although the reading and writing operation (denoted by "r and w") will cause loops in the diagram, there is no need to iterate these loops when finding paths, because by traversing the loop once, services pairs that violate security policies will be located. So, when finding paths, we only consider the loops once.

•
Use these paths and determination rules under the goals of confidentiality or integrity as input.
Compare the confidentiality or integrity levels of two adjacent vertices. For example, if the path is N 1 , N 3 , N 5 , N 6 , N 9 , then compare with N 1 N 3 , N 3 N 5 , N 5 N 6 , N 6 N 9 , respectively, in which N stands for a node. if in-degree of n=0 then 5: while adjacent node j of node n exists do 6: i ← FindPath(i,n,n,allPaths) 7: i

Purpose
In this section, a real-world software architecture was selected as an example. When designing this system, non-functional requirements were neglected. The methods proposed in this paper were used to analyze its architecture. Results were observed to see whether vulnerabilities in the architecture could be located in order to verify the feasibility and effectiveness of the proposed method.

Introduction of the Object System
The object system used in the case study was a flight reservation system of an airline company. The architecture of the system was service-oriented. Its main functions included user information input and statistics, user information storage, transmission and receipt of ticket information, notice of ticket pick-up, generation and printing of bills, statistics and checking of ticket sales, processing of ticket information based on user needs, generation of flight information, inquiry and feedback of flight information, inquiry and feedback of travel agencies, analysis and management of sales, and so on. To implement these functions, designers classified the system into four major modules: client-side information reception, client-side information output, network reception and dispatch, and server modules. Figure 6 shows the composition of this system. Software designers performed the preliminary design and detailed design and wrote a software architecture design document for this system.

Purpose
In this section, a real-world software architecture was selected as an example. When designing this system, non-functional requirements were neglected. The methods proposed in this paper were used to analyze its architecture. Results were observed to see whether vulnerabilities in the architecture could be located in order to verify the feasibility and effectiveness of the proposed method.

Introduction of the Object System
The object system used in the case study was a flight reservation system of an airline company. The architecture of the system was service-oriented. Its main functions included user information input and statistics, user information storage, transmission and receipt of ticket information, notice of ticket pick-up, generation and printing of bills, statistics and checking of ticket sales, processing of ticket information based on user needs, generation of flight information, inquiry and feedback of flight information, inquiry and feedback of travel agencies, analysis and management of sales, and so on. To implement these functions, designers classified the system into four major modules: clientside information reception, client-side information output, network reception and dispatch, and server modules. Figure 6 shows the composition of this system. Software designers performed the preliminary design and detailed design and wrote a software architecture design document for this system.

Experimental Design and Environment
In this paper, the client-side information reception module was selected as the experimental object to verify the analysis approach for vulnerability of information flow under the goal of confidentiality proposed in this paper. The network reception and dispatch module was selected as the experimental object to verify the analysis approach for vulnerability of information flow under the goal of integrity. The experiment was conducted on a computer installed with Ubuntu 19.04, PyCharm IDE, and an assisted modelling tool called Graphviz.

Experimental Process
Step 1. We used the architecture design document to determine and describe the services included in the client-side information reception module and network reception and dispatch

Experimental Design and Environment
In this paper, the client-side information reception module was selected as the experimental object to verify the analysis approach for vulnerability of information flow under the goal of confidentiality proposed in this paper. The network reception and dispatch module was selected as the experimental object to verify the analysis approach for vulnerability of information flow under the goal of integrity. The experiment was conducted on a computer installed with Ubuntu 19.04, PyCharm IDE, and an assisted modelling tool called Graphviz.

Experimental Process
Step 1. We used the architecture design document to determine and describe the services included in the client-side information reception module and network reception and dispatch module. The service composition of these two modules, the functions of these services, and the process to realize the whole module are shown in Appendices A and A.
Step 2. According to Appendices A and A and following the methods proposed in Section 3.4, we analyzed the client-side information reception module and network reception and dispatch module to obtain their service invocation elements and the confidentiality or integrity level of each service. The service invocation elements and confidentiality levels of each service in the client-side information reception module are shown in Tables 7 and 8. The service invocation elements and integrity levels of each service in the network reception and dispatch module are shown in Tables 9 and 10.  Table 7. Service invocation elements in client-side information reception module.

Service Invocation Pair
Operations Performed Direction of Data Flow S 1 invokes S 0 S 1 performs reading operation on S 0 S 0 → S 1 S 1 invokes S 2 S 1 performs writing operation on S 2 S 1 → S 2 S 2 invokes S 3 S 2 performs writing operation on S 3 S 2 → S 3 S 3 invokes S 3 S 3 performs writing operation on S 3 S 3 → S 3 S 4 invokes S 0 S 4 performs reading operation on S 0 S 0 → S 4 S 4 invokes S 5 S 4 performs writing operation on S 5 S 4 → S 5 S 4 invokes S 7 S 4 performs writing operation on S 7 S 4 → S 7 S 5 invokes S 6 S 5 performs writing operation on S 6 S 5 → S 6 S 6 invokes S 6 S 6 performs writing operation on S 6 S 6 → S 6 S 1 invokes S 7 S 2 performs writing operation on S 7 S 2 → S 7 S 7 invokes S 0 S 0 performs reading operation on S 7 S 7 → S 0 S 8 invokes S 2 S 8 performs reading operation on S 2 S 2 → S 8 S 8 invokes S 5 S 8 performs reading operation on S 5 S 5 → S 8 S 8 invokes S 9 S 8 performs writing operation on S 9 S 8 → S 9 S 8 invokes S 10 S 8 performs writing operation on S 10 S 8 → S 10 S 9 invokes S 11 S 9 performs writing operation on S 11 S 9 → S 11 S 10 invokes S 11 S 10 performs writing operation on S 11 S 10 → S 11 Table 8. Confidentiality values of each service invocation element in client-side information reception module.

Service Confidentiality Value
Service S 0 2.4 Service S 1 4 Service S 2 4 Service S 3 4 Service S 4 4 Service S 5 4 Service S 6 4 Service S 7 4 Service S 8 5 Service S 9 5 Service S 10 5 Service S 11 5 Table 9. Service invocation elements in network reception and dispatch module.

Service Invocation Pair Operations Performed Direction of Data Flow
S 0 invokes S 1 S 0 performs writing operation on S 1 S 0 → S 1 S 1 invokes S 2 S 1 performs writing operation on S 2 S 1 → S 2 S 1 invokes S 11 S 1 performs writing operation on S 11 S 1 → S 11 S 2 invokes S 3 S 2 performs writing operation on S 3 S 2 → S 3 S 3 invokes S 4 S 3 performs writing operation on S 4 S 3 → S 4 S 4 invokes S 4 S 4 performs writing operation on S 4 S 4 → S 4 S 4 invokes S 5 S 4 performs writing operation on S 5 S 4 → S 5 S 5 invokes S 5 S 5 performs writing operation on S 5 S 5 → S 5 S 10 invokes S 3 S 10 performs reading operation on S 3 S 3 → S 10 S 10 invokes S 3 S 10 performs writing operation on S 3 S 10 → S 3 S 1 invokes S 6 S 1 performs writing operation on S 6 S 1 → S 6 S 6 invokes S 7 S 6 performs writing operation on S 7 S 6 → S 7 S 7 invokes S 7 S 7 performs writing operation on S 7 S 7 → S 7 S 7 invokes S 8 S 7 performs writing operation on S 8 S 7 → S 8 S 8 invokes S 8 S 8 performs writing operation on S 8 S 8 → S 8 S 8 invokes S 9 S 8 performs writing operation on S 9 S 8 → S 9 S 10 invokes S 9 S 10 performs reading operation on S 9 S 9 → S 10 S 10 invokes S 9 S 10 performs writing operation on S 9 S 10 → S 9 S 10 invokes S 11 S 10 performs writing operation on S 11 S 10 → S 11 S 12 invokes S 5 S 12 performs reading operation on S 5 S 5 → S 12  Step 3. Based on the obtained service invocation information above, the service invocation diagram was constructed. Then, we used Graphviz to visualize the service invocation diagrams for client-side information reception and network reception and dispatch modules, as shown in Figures 7  and 8, respectively. In these figures, the value 1 on the edge stands for a reading operation, while 2 stands for a writing operation.    Step 4. We used the two algorithms for determining the vulnerability of information flow in architecture proposed in this paper to locate vulnerabilities in the client-side information reception Step 4. We used the two algorithms for determining the vulnerability of information flow in architecture proposed in this paper to locate vulnerabilities in the client-side information reception module and network reception and dispatch module. We developed a program which implemented these two algorithms and used the program to obtain the analysis results.

Results
The results were obtained as follows.
For the client-side information reception module, we applied the algorithm for determining the vulnerability of information flow in architecture under the goal of confidentiality. The results are shown in Figure 9.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 22 of 30 module and network reception and dispatch module. We developed a program which implemented these two algorithms and used the program to obtain the analysis results.

Results
The results were obtained as follows.
For the client-side information reception module, we applied the algorithm for determining the vulnerability of information flow in architecture under the goal of confidentiality. The results are shown in Figure 9. Paths containing vulnerabilities were obtained: P4: S0 → S1 → S7 → S0; P6: S0 → S4 → S7 → S0. The corresponding vulnerability information is S7 → S0. The reason for why this is vulnerable will be discussed in detail in Section 4.5.2.
For the network reception and dispatch module, we applied the algorithm for determining the vulnerability of information flow in architecture under the goal of integrity. The results are shown in Figure 10. The corresponding vulnerability information is S 7 → S 0 . The reason for why this is vulnerable will be discussed in detail in Section 4.5.2.
For the network reception and dispatch module, we applied the algorithm for determining the vulnerability of information flow in architecture under the goal of integrity. The results are shown in Figure 10. Paths containing vulnerabilities were obtained: P4: S0 → S1 → S7 → S0; P6: S0 → S4 → S7 → S0. The corresponding vulnerability information is S7 → S0. The reason for why this is vulnerable will be discussed in detail in Section 4.5.2.
The reason for why this is vulnerable will be discussed in detail in Section 4.5.2.

Results Analysis and Discussion
From the results of the vulnerability analysis of information flow in architecture under the goal of confidentiality conducted in the case, it can be seen that the vulnerability was S 7 → S 0 (detailed descriptions of S 7 and S 0 are provided in Appendix A), which violated the first security policy of information flow under this goal. By analyzing the architecture design specifications with the help of domain experts, we learned that when verification service returned an error value to the login service, user and ticket information could be leaked, which could cause leakage of users' private information such as name and itinerary. In addition, malicious attackers could use this vulnerability to try logging in to the system, for example by inputting erroneous information on purpose, and obtain return values of the system, which could be used to guess users' passwords. If the login succeeded, they could imitate users to modify ticket information or conduct other operations in the system. According to the analysis above, we provided this feedback on architecture vulnerability to the developers and suggested that they modify the architecture. Specifically, we suggested that the developers bind the verification service with a login service in order to make sure that users' identities can be verified when they attempt to login, rather than after their information is saved, as in this system.
From the result of the vulnerability analysis of information flow in architecture under the goal of integrity conducted in the case study, it can be seen that the vulnerabilities were S 10 → S 3 and S 10 → S 9 (detailed descriptions of S 3 , S 9 and S 10 are provided in Appendix B), which violated the fourth security policy of information flow under the goal of integrity. By analyzing the architecture design specifications, we learned that S 10 was a monitoring service that could monitor and intervene in S 3 and S 9 by collecting their runtime information. However, in the design specifications, it was not stated clearly that S 10 needed a specific transmission protocol such as virtual private network (VPN) to transfer data, nor was an encryption mechanism mentioned for data receipt and sending. What is worse, there was no permission or access control mechanism for the monitoring service. As a result, in operation, malicious attackers could modify, intercept, or delete data in S3 and S9 by attacking the monitoring service. This analysis result conforms to the reality that most software systems nowadays, especially complex software systems have integrated monitoring services, and because of the complex functional, performance, and accuracy requirement of the monitoring function, many designers tend to neglect security requirements when implementing the monitoring function, which could pose significant threats to the operation of software systems [63]. According to the analysis above, we provided feedback on these vulnerabilities to the developers and suggested that they modify the architecture. For example, we suggested employing VPN and data encryption for data transfer between the monitoring service and the main system, and the monitor service should not have operation privileges for the system.
To further demonstrate the effectiveness of the proposed method, we compare the vulnerability analysis results with common security vulnerabilities from famous vulnerability databases, including Common Vulnerabilities and Exposures (CVE), Open Web Application Security project (OWAP), and National Vulnerability Database (NVD). Common categories of vulnerabilities in these databases are listed in Table 11. Table 11. Common categories of vulnerabilities in Common Vulnerabilities and Exposures (CVE), Open Web Application Security (OWAP). and National Vulnerability Database (NVD).

Number
Category of Vulnerabilities 1 Buffer overflows 2 SQL injection 3 Cross-site scripting 4 Path traversal 5 Permission, privileges, and access controls 6 Cryptographic issues 7 Information transfer with plaintext 8 Unvalidated input 9 Improper error handling 10 Link following 11 Format string 12 Authentication issue, improper authentication In Table 11, we marked four categories of vulnerabilities in Italic form. The vulnerabilities from the results of the case study belong to these four categories.
Besides, we learned from recent news that there are security vulnerabilities in Amadeus flight booking system [64]. The attacker just needs to know the victim's passenger name record (PNR) number to exploit the vulnerability. By using the booking information (i.e., booking ID and last name of the customer) it is possible to access the user's account and perform other operations. What is worse, the user formation is transmitted without encryption in Amadeus system, which may draw man-in-the-middle attack. It can be seen that the vulnerabilities in Amadeus system are similar to the analysis results in our case study, which further verify the effectiveness of our proposed methods.

Comparison with Other Methods
In previous subsections, we demonstrated the feasibility and effectiveness of the proposed method. To further show the advantages of our method over existing methods, we compare our method with others.
As mentioned in the Introduction, the major contribution of this paper is that we propose a method for locating vulnerabilities in software architecture in the design phase, so the comparison with other methods is also based on this perspective. The comparison results are shown in Table 12. From Table 12, it can be seen that compared with the method in [25], our method can be used to locate architecture vulnerabilities, while that one cannot. The method in [26] can be used to locate vulnerabilities in architecture. However, the architecture in that method must be obtained after the software development is finished, because in that vulnerability locating method, the architecture of the developed software must be compared with the original version of the software architecture. Then, by injecting malicious source code, vulnerabilities can finally be discovered. As a result, that method cannot be applied in the design phase. The concept of architecture vulnerability is mentioned in [27], but the object of the case study was the source code and vulnerabilities were not located at all.

Threats to Validity
There are some risks and threats to validity, include the following:

1.
When assigning values for service confidentiality or integrity, although the assigning method proposed in this paper is based on widely accepted method in cyber security, such as CC (ISO/IEC Standard 15,408) [60] and ISO/IEC Standard 27,000 [59], in a specific assigning activity the method may be impacted by the domain knowledge and experience of researchers, as well as by whether the object system is thoroughly understood. So, when assigning values in this paper, in addition to researchers who possess domain knowledge, we also consulted experienced practitioners and invited them to discuss and confirm the value-assigning results. As a result, to some extent, there might be some subjectivity in the assigning results, but the correctness of the proposed method can be guaranteed.

2.
The outputs of the proposed vulnerability analysis method are the services and interactions between services that violate security policies. Further analysis on the outputs requires domain background and experiences of analyzers. In this paper, we invited domain experts who possess strong background knowledge and domain experience. So, the further analysis on the results in this paper is solid.

3.
A modeling approach and corresponding vulnerability analysis method is proposed in this paper, and concrete algorithm are given. However, automatic tools for the proposed work are lacking, which will be taken into consideration in our future work.

Conclusions and Future Work
In this paper, we propose an approach for analyzing the vulnerability of information flow in software architecture. First, based on confidentiality and integrity policies of information flow and current definitions of software vulnerability and combining features of software architecture, the concept and formal description of vulnerability of information flow in software architecture are given. We clarify that the vulnerability of information flow in software architecture refers to errors resulting from the violation of corresponding security policies during the flow and propagation of information in components, interactions between components, and the topology formed from those interactions. Moreover, targeting at the reading and writing operations during service interaction, security policies of information flow under the goals of confidentiality and integrity are proposed. Then, we propose an approach for constructing a service invocation diagram based on software architecture to depict the interaction and propagation of information flow in architecture as well as the confidentiality or integrity level of each service. After that, an algorithm is designed to located vulnerabilities of information flow in architecture by traversing all potential paths and comparing these paths with predefined security policies of information flow. Finally, a flight reservation system of an airline company was chosen as a case study. After conducting experiments, we found a vulnerability of information flow in the architecture by using confidentiality policies, which was the possibility of information leakage involving users and tickets when the verification service returns error values to the login service. By using integrity policies, we found two vulnerabilities of information flow in the architecture, concerning integrity hazards of the monitoring service. The monitoring service has unnecessarily high privilege when interacting with the system and there is no encryption mechanism for data read from this monitored system. From the case study, we can see that the proposed approaches can be used to locate vulnerabilities of information flow in architecture, thus the effectiveness and applicability can be verified. The proposed method was compared with other methods, and the results show the advantages of our method.
In the future, we will apply the proposed method to more software systems to further verify its effectiveness and usability and improve it. In addition, we will conduct research on value-assigning methods for confidentiality and integrity of services to remedy the shortcoming of the current method which relies highly on experts' experience, and improve the accuracy of the assigning method. Moreover, we will extend and complement the proposed modelling method by integrating existing security constrains of the architecture into the model. Finally, corresponding tools will be developed to support the proposed modelling approach and determination algorithms. With the refinement of our method and accumulation of experimental data, we will also consider incorporating machine learning methods from intelligent computing into our work, such as in [65,66]. In addition to the above work, we will also try to apply our research to automotive field, such as autopilot systems. We found several studies on reliability issues in this area [67,68]. As an important quality attribute, security should also be considered in future research.

Acknowledgments:
The authors would like to thank Zhang Dajian, chief manager at the Testing Center of Finance, China, Chen Qinmin, senior security engineer at the Beijing Certificate Authority, and Zeng Yong, architecture designer expert at the Xiangcao Software Company for their suggestions on value determination in Section 3.4.3 and further analysis on our experiment results in Section 4.5.2.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Appendix A describes services in client-side information reception module. To implement the major functions of the client-side information reception module, designers performed detailed design and wrote a software design document. According to the software design document, the workflow of the client-side information reception module can be abstracted into the following 12 services. The name and description and each service are shown in Table A1. To ease the modelling process, we numbered these services from 0 to 11. Table A1. Name and description of each service in client-side information reception module.

Service Name and Description
Service S 0 Entrance service. The starting point of the module.

Service S 1
PersInfoExam service. Performs preliminary examination for information input in traveler's information panel. If errors exist, determines the type of error and uses corresponding PErrorType or PErrorRank as parameters to transmit into ErrorHandle service. If no errors exist, transfers to PersInfoInput service.
Service S 2 PersInfoInput service. After examination without error, transmits data from input datasheet to PersInfoTempSave service.

Service S 3
PersInfoTempSave service. Receives traveler's information from PersInfoInput service and saves to a temp file PersInfoTemp.txt. This temp file will be used to verify data transferred from the server and should be deleted after the whole process is complete.

Service S 4
CheckNoticeExam service. Performs preliminary verification of information input in check and ticket collection panel. If errors exist, determines the type of error and uses corresponding CErrorType or CErrorRank as parameters to transmit to ErrorHandle service. If no errors exist, transfer into CheckNoticeInput service.
Service S 5 CheckNoticeInput service. After examination without error, transmits data from input datasheet to CheckNoticeTempSave service.

Service S 6
CheckNoticeTempSave service. Receives traveler's check and ticket collection information from CheckNoticeInput service and saves to a temp file CheckNoticeTemp.txt. This temp file will be used to verify data transferred from the server and should be deleted after the whole process is complete. Service S 7 ErrorHandle service. If errors are found in PersInfoInput or CheckNoticeInput service, transfers to this service. Performs corresponding dealing process and outputs error message.

Service S 8
OrderCollectDetermination service. Determines whether this operation is ordering or collecting tickets based on the input of PersInfoInput service and set flags. According to the flags, transfers to OrderRequireTrans or CollectRequireTrans service.

Service S 9
OrderRequireTrans service. Before transferring order requirements, performs hardware and software preparation, for example, prepares ticket ordering information, including traveler's information, host's information and so on.

Service S 10
CollectRequireTran service. Before transfering collecting requirements, performs hardware and software preparation, for example, prepares ticket collecting information, including traveler's information, host's information, and so on.
Service S 11 DataTrans service. Transmits information from client-side to server.

Appendix B
Appendix B describes services in network reception and dispatch module.
To implement the major functions of the network reception and dispatch module, designers performed detailed design and wrote a software design document. According to the software design document, the workflow of the network reception and dispatch module can be abstracted into the following 13 services. The name and description of each service are shown in Table A2. To ease the modelling process, we numbered these services from 0 to 12.  NetSend service. Sends packed data packets. When sending data packets, keep connection with NetMonitor service. Accepts and returns network data transmission status. According to status information, continues sending data packets or adjusts correspondingly.

Service S 10
NetMonitor service. From establishment of connection from server to client to completion of data transmission and disconnection, NetMonitor services keeps monitoring network status. Meanwhile, sends status information to NetReceive and NetSend service, and receives status information of data receiving and sending from these two services. If errors occur in the status information, transfers to NErrorHandle service.
Service S 11 NErrorHandle service. Handles errors in network transmission module.
Service S 12 JudgeCSFlag service. Based on SetCSFlag in client input or server module, determines which module should receive transfer after network module is completed, client output or server module.