4.3.1. Analyzing the Created Clusters Using Cohesion and Coupling
Cohesion is a key internal quality metric in SMC. It measures how closely related the elements within a module are. High cohesion typically leads to better modularity, improving maintainability, reusability, and code clarity. To evaluate the performance of the BBOA in generating cohesive clusters, eight independent runs were performed on various benchmark software systems.
Figure 5 presents the cohesion values obtained across these runs. The
Mtunis project showed moderate variability, with values ranging from 28 to 34. Most of the values clustered between 30 and 33, indicating that the BBOA consistently finds cohesive solutions for this system. The
ispell project exhibited more variation, ranging from 31 to 44. A peak cohesion of 44 in one run suggests a well-formed cluster, while lower values may reflect sensitivity to parameters or project complexity. Similarly, the
rcs project had a broad range of 44 to 72, with two runs achieving particularly high cohesion.
Bison and CIA showed moderate stability. Bison’s cohesion ranged from 51 to 60, while CIA’s varied between 50 and 73. These fluctuations indicate that the BBOA performs reasonably well but can sometimes converge to suboptimal solutions, possibly due to initial population differences or search dynamics. In contrast, the DOT project demonstrated high and stable cohesion between 57 and 77 across all runs. This suggests that the BBOA effectively captures strong modular relationships in this case. The most consistent results came from the PHP and GRAPPA projects. PHP showed a narrow range of 55 to 59, indicating stable clustering performance. GRAPPA had the highest cohesion values, ranging from 111 to 118 in every run. These results highlight the BBOA’s robustness in handling systems with clear structural boundaries.
Finally, both stunnel and xtell showed low variability, with cohesion values between 30 and 34. This reflects stable performance in identifying module boundaries across runs. Overall, the BBOA proves to be a robust and competitive method for SMC. While some benchmarks showed variability due to complexity or stochastic behavior, others consistently benefited from the algorithm’s exploitation capability. These findings underscore the value of repeated runs in metaheuristic clustering to ensure reliable and stable performance.
Coupling is another key metric in SMC that reflects inter-cluster dependencies. An effective clustering method, such as the BBOA, aims to reduce coupling while maximizing cohesion. Lower coupling indicates that modules within each cluster primarily interact internally, which supports modularity, maintainability, and reusability. As illustrated in
Figure 6, the coupling values varied across eight independent runs and across different benchmark systems.
Mtunis,
stunnel, and
xtell consistently show low coupling values, typically ranging from 23 to 33. This indicates well-separated clusters with minimal external dependencies. The consistency across runs suggests that the BBOA performs reliably for smaller or well-structured systems.
In contrast, systems like DOT, bison, and GRAPPA exhibit higher coupling. For example, DOT recorded the highest coupling values, between 171 and 191, indicating significant inter-cluster communication. GRAPPA also shows high but stable coupling in the range of 134 to 141. These results may reflect the intrinsic complexity or tightly integrated architecture of these projects. Intermediate coupling behavior is observed in PHP and rcs, where the values are moderate and stable. This suggests that these systems have a partially modular architecture, and that the BBOA handles them reasonably well, although further improvement is possible.
An important observation is the stability of coupling in some systems across multiple runs. For instance, PHP, GRAPPA, and xtell show consistent values, indicating that the stochastic nature of the BBOA has a limited impact in these cases. However, systems with complex architectures may still experience some variability due to differing initial conditions or search paths. Overall, the BBOA demonstrates a strong capacity to minimize coupling in simpler systems while maintaining stability across runs. This behavior highlights its robustness and suitability for SMC. It also suggests the potential for hybrid or adaptive methods to further improve results on complex software.
Cohesion and coupling are central to evaluating clustering quality.
Table 4 presents the mean and standard deviation of cohesion across all benchmark systems. The
GRAPPA project achieves the highest mean cohesion (116.4).
PHP,
bison, and
rcs also show good cohesion performance. Simpler systems like
Mtunis,
ispell,
stunnel, and
xtell maintain lower but stable cohesion values, likely due to smaller sizes or simpler structures. The relatively low standard deviations across most systems indicate that the BBOA performs consistently. These results affirm the algorithm’s reliability in producing well-formed clusters and underscore the importance of multiple-run evaluations when applying metaheuristic algorithms to SMC.
Table 5 shows the amount of coupling among the clusters created.
DOT has the highest coupling (about 180.8), with relatively consistent high values, indicating that the BBOA struggles to separate interdependent components effectively in this large and possibly tightly coupled system. Similarly,
GRAPPA and bison show higher coupling, albeit with manageable variance. In contrast, systems like
Mtunis,
stunnel, and
xtell show consistently low coupling, highlighting the BBOA’s effectiveness in these cases. Regarding the results, low coupling and high cohesion are achieved in systems like
Mtunis and
xtell. Moderate coupling and cohesion are observed in medium-complexity projects like
PHP and
bison. High coupling with varying cohesion occurs in complex systems like
DOT and
GRAPPA. The standard deviation values suggest that the results are generally stable across runs, with slightly more variability in systems like
rcs,
cia, and
bison.
Figure 7 illustrates the comparative analysis of cohesion and coupling values averaged across eight independent runs for each of the ten software benchmark projects. This comparison underscores the effectiveness of the proposed BBOA-based clustering method in achieving a critical software engineering objective: maximizing cohesion while minimizing coupling. As shown in
Figure 7, projects like
GRAPPA and
DOT exhibit exceptionally high cohesion scores (above 110 and 70, respectively), indicating strong intra-cluster similarity. However, they also demonstrate relatively high coupling values, which implies that some modules maintain dependencies outside their clusters. This suggests that while the BBOA effectively grouped strongly related modules together, the nature or structure of these particular software systems may inherently involve more cross-cluster interactions. In contrast,
Mtunis,
stunnel, and
xtell show a more balanced profile, with moderate cohesion values accompanied by low coupling scores. These systems exhibit a favorable trade-off between cohesion and coupling. With regard to the results, the BBOA was able to form clusters that were both internally coherent and externally independent, which is a desirable metric in software clustering.
PHP and
ispell also stand out for maintaining low-to-moderate coupling while achieving consistent cohesion. On the other hand,
rcs,
bison, and
cia show varied patterns, reflecting system-specific complexities that may affect clustering performance. Overall, the results shown in
Figure 7 support the core merit of the BBOA approach and its ability to manage the trade-off between cohesion and coupling. By evaluating both metrics side by side, it becomes evident that the BBOA does not simply optimize one metric at the expense of the other; instead, it seeks a balanced modular structure, where tightly bound components are clustered together while inter-module dependencies are minimized. This balance is critical for enhancing the maintainability, scalability, and understandability of software systems.
4.3.2. Analyzing the Created Clusters Using Clustering Quality
Figure 8 presents the boxplot analysis of MQ for ten software benchmark systems across multiple runs. MQ is a comprehensive metric that captures the overall quality of clustering by combining both cohesion and coupling properties.
Figure 8 reveals that
GRAPPA consistently achieves the highest MQ scores, with median values around 4.9 and narrow interquartile ranges; these results indicate stable and optimal clustering behavior. Similarly,
PHP demonstrates robust MQ performance, with values ranging from approximately 3.1 to 3.76. In contrast, systems like
ispell and
DOT exhibit lower MQ scores, with wider variability in some cases. These fluctuations suggest that the internal structure of these systems may pose challenges to achieving high modularization. Despite this, their MQ scores remain within acceptable bounds, which confirms the general adaptability of the BBOA method. Projects such as
bison,
rcs, and
cia show moderate MQ values with relatively tighter spreads and indicate reliable performance across runs. The
stunnel,
xtell, and
Mtunis systems, while exhibiting slightly lower maximum values, demonstrate consistent MQ scores. The results presented in
Figure 8, when interpreted alongside the cohesion and coupling analyses, emphasize the effectiveness of the proposed BBOA-based approach in generating balanced, high-quality clusters. It not only captures high intra-cluster cohesion and low inter-cluster coupling but also translates these into strong overall modularization quality.
Figure 9 shows the two clustering models produced by the proposed BBOA, and they can be compared based on their MQ, cohesion, and coupling. The first clustering model (Clustering 1) has an MQ value of 1.85882, cohesion of 36, and coupling of 21, whereas the second model (Clustering 2) shows a higher MQ of 2.13025, but it has a reduced cohesion of 24 and an increased coupling of 33. These figures highlight a trade-off between achieving higher overall modularization quality and maintaining desirable internal cluster characteristics. In Clustering 1, the higher cohesion value indicates that the modules grouped together within each cluster share stronger internal relationships. This can be observed visually, where core modules such as
User,
Computer,
Memory, and
State are placed within the same cluster (Cluster 1). Additionally, coupling is relatively low in this configuration, meaning that inter-cluster dependencies are minimized. This structure is beneficial from a software maintenance perspective, as high cohesion and low coupling typically lead to more understandable, maintainable, and reusable modules.
On the other hand, Clustering 2 exhibits a higher MQ value, which generally reflects an improved overall balance between cohesion and coupling across all clusters. However, this gain in MQ comes at the expense of both cohesion and coupling individually. The increased number of clusters (from 3 to 6) in Clustering 2 results in the fragmentation of logically related modules across different clusters. For instance, Memory is isolated in its own cluster (Cluster 6), while User, Computer, and State are scattered into separate clusters. This fragmentation likely weakens the internal relationships within clusters and increases the number of cross-cluster interactions. From a modularization perspective, Clustering 2 may be preferable when the primary objective is to maximize MQ. Clustering 2 shows that the BBOA can improve MQ by redistributing modules; further refinements would be necessary to reduce coupling and restore cohesion without compromising overall modular quality. Therefore, the selection of the preferred clustering solution should align with the specific design goals and quality priorities.
The calculated similarity rates for the clusters provide valuable insight into the internal cohesion and external interaction of the modules grouped by the BBOA clustering result. These similarity scores, calculated by Equation (3), quantify the proportion of intra-cluster connectivity relative to inter-cluster communication.
Figure 10 shows the similarity rates of the modules located in the same clusters. A higher value suggests that modules within a cluster are more tightly connected to one another and less dependent on modules in other clusters. Among the six clusters, Cluster 5 achieved the highest similarity score of 0.6667, indicating a strong internal structure and minimal external dependency. This cluster includes only three modules (
main,
Control, and
Family), which appears to promote a compact and cohesive module with limited coupling. This configuration reflects a highly modular and maintainable design model. Cluster 3 and Cluster 2 also exhibited relatively high similarity values of 0.6207 and 0.6154, respectively. These values suggest a good balance between internal links and external connections. Their structure implies that while there are some dependencies on other clusters, the modules are generally well grouped and maintain coherent internal relationships.
Cluster 1 demonstrated a moderate similarity score of 0.5926, which, although lower than that of Clusters 2 and 3, still reflects acceptable modular quality. This result may suggest that, while the internal cohesion is reasonable, there is a slightly higher reliance on other clusters, which could be improved by restructuring the interdependencies. Cluster 4 stands out with a similarity score of 0.5405, the lowest among the multi-module clusters. This lower score results from a relatively high number of inter-cluster links (17), despite having the largest number of internal links (10). The high external interaction suggests that modules in Cluster 4 are heavily coupled with other parts of the system, potentially reducing its modular integrity and maintainability. Such a structure may benefit from a reevaluation of module placement to reduce external dependencies.
Finally, Cluster 6, which includes only the Memory module, has a similarity score of 0.0000. Since there are no intra-cluster links (only one module) and it maintains four links to other clusters, this result emphasizes the module’s complete reliance on external entities. While a single-module cluster is sometimes acceptable, its high degree of coupling suggests that it might be better integrated into a larger, contextually relevant cluster. In conclusion, the similarity analysis supports the overall clustering quality provided by the BBOA, with Clusters 2, 3, and especially 5 reflecting good modular characteristics. However, Clusters 4 and 6 highlight areas for potential improvement in reducing inter-cluster dependency and enhancing cohesion. These metrics can guide future refinement of the clustering strategy to achieve a more modular and maintainable software system architecture.
The suggested SMC method utilizes module clustering to identify structural and behavioral similarities among software modules. By grouping related modules, the approach highlights symmetrical features in the source code, which often indicate well-formed architectural patterns, such as modules that collaborate consistently to achieve a defined functionality. This clustering process improves modularity by exposing natural boundaries within the system. Conversely, the method also uncovers asymmetries. For example, some modules may have excessive dependencies on parts of the system that are not functionally related. Such irregularities often reflect deeper architectural issues, which include poor modularity and understandability. Detecting these asymmetries provides valuable diagnostic insight and enables software architects to refactor problematic modules and strengthen overall architectural quality. In this way, the suggested SMC method functions not only as a tool for system organization but also as a mechanism for identifying and addressing potential architectural weaknesses in large, complex source codes.
4.3.3. Comparison with Previous Works
The comparison of MQ across various clustering methods (implemented under identical software and hardware environments) demonstrates the effectiveness of the proposed BBOA in producing high-quality software modularizations. MQ is a widely accepted metric for evaluating clustering quality, as it reflects the degree of internal cohesion within clusters and the extent of coupling between them; higher MQ values are indicative of more coherent and maintainable module groupings.
Figure 11 shows the MQ values obtained by different methods in different benchmark software projects. Among the six evaluated methods (GA, ACO, PSO-GA, SFLA, SCSO) and the proposed BBOA, the BBOA consistently achieved the highest MQ values across the majority of the ten benchmark software systems. Regarding the results, the BBOA outperformed all other techniques. In large-scale systems such as
GRAPPA,
PHP, and
bison, the BBOA demonstrated significant superiority, whereas in smaller systems, like
Mtunis and
ispell, it remained highly competitive.
Conversely, the GA and ACO reported comparatively lower MQ scores, particularly in complex software systems. Overall, the consistently high MQ values obtained by the BBOA underscore its capability to identify well-defined module boundaries while minimizing inter-module interactions. These results validate the BBOA as a reliable, efficient, and scalable solution for software module clustering and architectural optimization.
Table 6 shows the performance of the proposed SMC method on the Grappa system using different parameter settings. The parameters include population size, êg (reduction percentage), Lm (attraction intensity), and Lf (distance scale). The results are measured with MQ, coupling, and cohesion. The best result is obtained with a population size of 35, êg = 0.185, Lm = 6 mm, and Lf = 5 mm. This setting gives the highest MQ (4.90198), with balanced coupling and cohesion (126, 126). Lower population sizes, such as 20 with êg = 0.155, lead to the lowest MQ (4.56707). Although the cohesion is higher, the clustering quality is weaker. Medium settings (population 25–30, with êg around 0.165–0.185) give stable MQ values with acceptable coupling and cohesion. In summary, larger population sizes and properly tuned êg values (0.180–0.185) improve MQ and keep coupling and cohesion balanced. Parameter selection plays a key role in clustering quality.
The suggested SMC method was further evaluated on another series of real-world software projects of different sizes and complexities.
Table 7 presents the results in terms of MQ, cohesion, and coupling. For acqCIGNA (114 nodes, 179 dependencies), the method achieved an MQ of 6.07509, with cohesion of 106 and low coupling (73). In smaller projects such as nos and telnet2, the method also maintained a balance between cohesion and coupling, with acceptable MQ values. In larger projects, the method showed scalability. For Archstudio (583 nodes, 866 dependencies), MQ remained high (6.01205), while cohesion (522) and coupling (340) reflected stable modularization. In the bash project (373 nodes, 901 dependencies), MQ was 3.18053, with high cohesion (767) but also increased coupling (1741) due to system complexity. Overall, the results confirm that the suggested SMC improves modular quality across projects of different scales. It consistently balances cohesion and coupling and ensures more maintainable software structures.
As shown in
Table 8, the paired t-test analysis demonstrates that the proposed BBOA method performs significantly differently from most of the compared algorithms across the evaluated programs. Regarding the results, the BBOA shows statistically significant differences when compared to the GA, ACO, PSOGA, SFLA, GWO, and WOA, with
p-values well below the 0.05 threshold. This indicates that the BBOA consistently provides improved performance over these methods. In contrast, the difference between the BBOA and SCSO is not statistically significant (
p = 0.1524), suggesting that their performances are comparable. Overall, these results highlight the effectiveness and competitiveness of the BBOA, particularly in scenarios where other methods may struggle to achieve optimal clustered models.
4.3.4. Discussion
The proposed BBOA demonstrates significant effectiveness in the domain of software module clustering by consistently yielding high MQ scores and achieving a well-balanced trade-off between cohesion and coupling. MQ serves as an integrated metric that encapsulates both the internal cohesion of modules and the coupling between them. The BBOA’s optimization strategy effectively explores the search space to discover modular boundaries that maximize cohesion (by grouping strongly related modules together) while simultaneously minimizing coupling by reducing unnecessary inter-cluster dependencies. This dual objective is crucial for producing software architectures that are not only logically sound but also easier to maintain, extend, and refactor. The results across multiple benchmark software systems reveal that the BBOA does not merely optimize MQ values in isolation but also ensures that clusters remain semantically meaningful and structurally decoupled. Furthermore, the BBOA maintains a strong global search ability while preserving good local solutions; it is suitable for different software sizes and complexities.
Clustering software modules based on source code and their dependencies (using the proposed method) can reveal the original design and business purpose of a system within the 4 + 1 architectural model. Module dependencies clarify the logical view and show how functionality is distributed. Clusters also reflect the development view by organizing code into coherent subsystems. In the process view, interactions between modules indicate runtime behavior. The physical view can be inferred by mapping clusters to deployment nodes. Finally, use-case scenarios are traced through clustered modules. Dependency-based clustering thus provides a practical way to reconstruct architectural insights and understand the business-driven design of software systems.
The scalability of the proposed SMC method is evidenced by its evaluation across a diverse set of real-world benchmark programs; the benchmark programs vary substantially in size and structural complexity. The results on benchmarks like dot (42 modules, 255 connections) and cia (38 modules, 216 connections) demonstrate that the method can effectively handle highly interconnected graphs where the average module connectivity exceeds five connections per module. Moreover, the successful clustering of a partial subset of PHP with 191 connections further underlines the method’s adaptability to complex and practical codebases. Overall, the benchmarks cover a scaling spectrum from lightweight software systems to larger and more complex systems. This quantitative variation (20–86 modules and 57–295 connections) confirms that the proposed SMC is scalable, generalizable, and applicable in practical software engineering environments.
In software architecture, metrics such as cohesion and coupling are essential for evaluating systems’ quality, maintainability, and scalability [
18,
19]. High cohesion within the logical and structural models ensures that each component has a clear responsibility, as seen in banking systems, where transaction modules remain focused on financial logic. Low coupling across the process and development models supports concurrency and independent evolution, a necessity in cloud-based microservices such as Netflix or Amazon, where services can be deployed or scaled independently. Recent studies highlight the importance of refined metrics: semantic-based cohesion and coupling measures significantly improve modularization accuracy and adaptability in large-scale architectures [
19,
20,
21]. Thus, cohesion and coupling provide quantitative, objective insights that strengthen the robustness of architectural decisions across all 4 + 1 views.
In the suggested SMC method, the created design model enhances code comprehensibility and reduces maintenance costs by clustering software modules effectively and organizing source code into clear structures. The proposed discrete BBOA-based method improves modularization quality (MQ) by achieving higher cohesion and lower coupling, which are essential metrics of modularity and maintainability. Furthermore, the balanced trade-off between cohesion and coupling directly supports the creation of well-structured software systems, making the software projects easier to understand and evolve. The experimental results, particularly on large and complex systems, demonstrate that the model not only ensures robust modularity but also contributes to improved software architecture recovery; the created clustered models facilitate long-term understandability and maintainability.
The runtime comparison in
Figure 12 shows differences among the evaluated SMC methods. Previous approaches such as the GA and PSO require longer execution times due to their iterative search processes. The hybrid PSO-GA achieves better balance but still incurs higher computational cost. In contrast, the proposed method demonstrates reduced runtime and provides higher efficiency in reaching optimal clustering. This reduction highlights the method’s scalability and suitability for larger software systems.
Despite its strengths, the BBOA has certain limitations. One of the primary drawbacks is its cost when dealing with very large-scale software systems. Additionally, the BBOA, like many metaheuristic algorithms, may require careful parameter tuning to achieve optimal results, and inappropriate settings can lead to premature convergence or suboptimal clustering outcomes. Lastly, the BBOA does not explicitly incorporate domain-specific knowledge or semantic information, which could enhance the quality of clustering in context-sensitive applications. Therefore, while the BBOA proves to be a powerful and general-purpose clustering method, further enhancements could improve both its performance and applicability in practical software engineering environments.
One observed limitation of the proposed method is the occasional formation of single-module clusters that maintain notable coupling with multiple other clusters. While single-module clusters can be valid in certain scenarios (particularly when the module represents a distinct and self-contained functionality), the presence of high inter-cluster coupling suggests that such modules may be more appropriately integrated into a larger, semantically related cluster. This indicates a potential misalignment between structural dependencies and the clustering outcome. As shown in
Figure 10, the similarity analysis reinforces the overall effectiveness of the BBOA. However, these observations point to areas where the clustering strategy could be enhanced to yield a more cohesive and maintainable software architecture.