Algorithm 3 accepts source tree (
) and distortion parameters to create a target tree (
) as the number of added (
a) and deleted nodes (
d), and percentage of matched nodes with changed parent (
). Matched nodes are nodes with the same labels in both source and target tree. The algorithm works as follows: first, nodes
and edges
of the source tree are copied into
and
. Matched nodes are stored in set
M, corresponding to the set
after the node deletion. The algorithm randomly selects
d nodes from set
by using function
getRandomNode. Randomly selected node
is deleted from set
, together with the corresponding edge to the parent node from the set of edges
. The edge that is deleted is obtained by function
getEdgeToParent and given the child node. Upon deletion, the algorithm creates new nodes and adds them to the node-set
. For each new node
the algorithm randomly selects its parent node
and creates a corresponding edge
. If the parent node has children, the tree can be additionally modified by replacing the parent of one of the randomly selected children
c and set
as the new parent. This modification depends on the
spreadDecision function, which in this paper decides with a 50% chance. However, the function can be parametrized or defined differently based on specific usage. The third step of the algorithm is to replace parent node for matched nodes from set
M. First, the number of parent-changing nodes
is calculated over the parameter
and the number of matched nodes
. Node
is selected from a set of nodes
N, which initially contains matched nodes
M. At each iteration performed
times, set
N is reduced by node
. The algorithm acts similarly for adding the nodes; it randomly selects a new parent node
from a set of nodes
, where a newly added node, as a matched node, can be the new parent node. Similarly to the adding node, node
can be added as child to the new parent
in two ways: by replacing the existing child node
c or adding it as the new child node. The difference is that the edge
to the previous parent node
is deleted from set
. On the other hand, the final number of nodes with a changed parent may eventually be greater than
M. Deleting a node with children involves changing the parent of the child, and in cases where a new or an existing node replaces the child, that child changes the parent. However, in this paper, we consider such changes implicitly with the initial distortion parameters. Future work could include an evaluation of Algorithm 3 that would consider such changes separately. Algorithm 3’s complexity is in the worst-case
when
nodes are deleted, and
nodes are added, where
corresponds to
d, and
corresponding to
a. An example of tree generated by Algorithm 3 from the tree shown in
Figure 6a as the source tree with 100 nodes is shown in
Figure 6b. The distortion parameters used are 10 added and deleted nodes and 10% of matched nodes with changed parents (10, 10, 0.1). Note that tree in
Figure 6a is generated by Algorithm 2 with the following distribution: 32.5%, 50%, 10%, 5%, and 2.5%. The given distribution of nodes corresponds to the distribution of tree representing class hierarchy of
NewPipe program version 0.8.9 shown in
Figure 5. Trees in
Figure 6a,b are similar, since the tree in
Figure 6b is generated by controllable distortion of source tree in
Figure 6a. On the other hand, the tree in
Figure 6c is generated by Algorithm 2 with the same distribution as the tree in
Figure 6a. An approach where both trees are generated by Algorithm 2 can also be used to generate trees for comparison. In this case, the distribution of nodes in both trees is equal, but the distortion is in the form of a possibly different number of nodes and random connections between nodes.