This section describes the concepts of the architecture-level performance model PCM and how we transform DSL instances into PCM models.
5.1. Palladio Component Model
We chose to use PCM [6
] as a model-based performance evaluation tool as it enables engineers to specify software systems independent of technology, include resource demands for software components, consider resource contention, and predict not only response time, but also resource utilization. Furthermore, the tool support is mature, open source, and continuously maintained with a large community.
In particular, PCM is developed for component-based software systems and enables engineers to describe performance relevant factors of software architectures, resource environments, and usage behavior [4
]. It is implemented in Ecore from the Eclipse Modeling Framework (EMF) and consists of multiple models [6
]. Software interfaces and components are specified in the Repository Model (Figure 4
a). Components provide the implementation for signatures of interfaces. Therefore, they contain a resource demanding service effect specification (RDSEFF) in which the activities such as parametric resource demands and external calls of signatures are modeled similar to activity diagrams (Figure 4
b). Components are additionally assembled in a System Model. In the Resource Environment Model, network and hardware resources are specified such as processing resources (CPU, disk, and delay), processing rates, and scheduling policies. The Allocation Model allows for deploying assembled components from the System Model on resources from the Resource Environment Model. The usage and workload of software components are specified in the Usage Model. Finally, PCM provides a simulator for its models, which is based on a process-oriented discrete event simulation.
5.2. Transformation to PCM
We describe the transformation for each DSL component. Table 1
shows the mapping of DSL concepts to PCM elements. An Execution Architecture is transformed to a Repository Model (Figure 4
a). In order to traverse the Edges and Nodes of an Execution Architecture, we use a recursive depth-first search. Upon visiting each Node, we check if it contains child Nodes and Edges. If this is the case, we again traverse this Node and the procedure repeats.
For each Node, we create one Interface with several signatures and a corresponding Basic Component that provides the signatures using an RDSEFF. If a Node contains child Nodes, we add a delegate signature to the corresponding Interface (i.e., IJob0). Additionally, the Basic Component requires the Interfaces of the child Nodes.
Parameters of the Configuration and parametric dependencies of the Execution Architecture are transformed into input parameters of each Signature. We consider parameters for the number of files, the data size of one file, the default partition size, the number of partitions, and the number of executors. In order to model and limit the maximum number of concurrent tasks, we separately specify an Infrastructure Component to represent a pool of available task slots. The component contains two SEFFs to acquire and to release a task slot. In order to finally execute a task, a slot must be acquired first. After task completion, the slot is released again. In the case of Apache Spark, the limiting number of task slots is the number of total cores.
Edges are represented in the RDSEFF of a Basic Component. Each delegate RDSEFF models the flow by using External Call Actions to invoke signatures of required Interfaces in the specified order (i.e., Job0 invokes the prepare signature of IStage0). In the course of this, the input parameters are forwarded and altered at specific points to model the data transmission factor of an Edge.
If a Node contains a Resource Profile, we transform it by creating several model elements. In order to call a group of tasks in parallel, we add two signatures to the corresponding Interface of the Node (i.e., Stage0). The providing RDSEFF prepare is intended to create a set of parallel tasks. It uses a Distributed Call Action to invoke the execute signature of the same Interface several times in parallel. The parallelism is either defined by the number of partitions of a data source or the specified parallelism of the Node . The execute RDSEFF acquires and releases a task slot before and after prompting a task.
We create an additional Interface and Basic Component (i.e., TaskForStage
) to model a task. Its behavior run
is responsible to execute the parametric resource demands of a task (Figure 4
b). Only the wait
demand of a Resource Profile will be executed in the prior prepare
RDSEFF as the demand occurs once at the beginning of each stage and not for each task. We automatically assemble all Basic Components of the Repository Model in order to derive Palladio’s System Model.
Since the Resource Architecture follows the concepts of Palladio’s Resource Environment Model, the transformation is linear. We transform each Resource Node to a Resource Container and convert the Cluster Specification and Resource Role accordingly. Additionally, we transform each Resource Unit to an equivalent Processing Resource Unit including the specification of processing rates, number of replicas (e.g., the number of cores), and scheduling policies. Finally, all Resource Containers are connected to networks via a Linking Resource.
In order to create the Allocation Model, we deploy all assembled Basic Components from the System Model on the master Resource Container from the Resource Environment Model. Our previous extensions [9
] enable Palladio’s simulation framework SimuCom to distribute resource demands to Resource Containers that represent worker nodes with a round robin policy.
Finally, we transform the Data Workload Architecture to a Usage Model. We create one Entry Level System Call that invokes the delegate signature of the Application Interface. The required input parameters are transformed based on the Data Model and Data Source. We specify the number of files, the data size of one file, the default partition size, and the number of partitions. For the Single Data Source, we create a simple closed Workload with a population of one, which means the Entry Level System Call is triggered once.
All transformed models can be used by Palladio’s simulator to predict performance metrics.