5.3.1. Performance Across Different Datasets
To comprehensively evaluate the applicability and advantages of the proposed FLEX-SFL framework in various heterogeneous task scenarios, systematic tests were conducted on three datasets—FMNIST, CIFAR-10, and CIFAR-100—comparing FLEX-SFL with six existing baseline methods. In FLEX-SFL’s heuristic selection mechanism, the random selection ratio was set to to balance diversity and representativeness, with all methods trained under consistent configurations to ensure fair comparison.
Table 3 lists the test accuracies of each method after 100, 500, and 1000 training rounds, while
Figure 4 illustrates the accuracy trends with training rounds. The results show that FLEX-SFL achieved optimal performance on all three datasets, validating its generalization capability in both system and statistical heterogeneous environments.
On the FMNIST dataset, FLEX-SFL achieved a final accuracy of 88.1%, outperforming traditional federated learning methods—FedAvg (71.4%), FedProx (73.6%), and MOON (76.3%)—by 16.7, 14.5, and 11.8 percentage points (pp), respectively, and surpassing Split Learning (SL)-based methods SplitFed (74.7%) and SplitMix (71.1%) (which suffer from static model partitioning and a synchronous communication-induced inability to handle device computing power disparities) by 13.4 and 17 pp. Even compared with the state-of-the-art (SOTA) method FedRich (83.6%), FLEX-SFL still maintained a 4.5 percentage point (pp) advantage in final accuracy.. On the CIFAR-10 dataset, FLEX-SFL led all methods, with 83.8% accuracy (a 2.1 pp improvement over FedRich (81.7%)). Traditional FL methods (FedAvg, FedProx) yield accuracies below 53% due to their incompetence in processing highly heterogeneous data, while SplitFed (68.8%), though validating model partitioning effectiveness, faces communication bottlenecks from synchronous aggregation. By contrast, FLEX-SFL enhances training stability via entropy-driven client selection (to expand sample diversity coverage) and hierarchical asynchronous aggregation (to reduce latency). In the high-complexity CIFAR-100 scenario (characterized by fine-grained categories and extremely unbalanced data distribution), FLEX-SFL still outperformed all methods, with 46.4% accuracy (1.5 pp higher than FedRich (44.9%)), whereas traditional methods exhibited severe performance degradation (e.g., SplitFed only reached 29.1%). FLEX-SFL addresses high-dimensional heterogeneity challenges through device-aware adaptive segmentation (for balancing computing loads) and edge cluster-based local aggregation (for enhancing model consistency).
5.3.2. Comparison of Convergence Rates
To further verify the engineering feasibility and execution efficiency of FLEX-SFL in practical deployment scenarios, this section takes the target accuracy from
Section 5.3.1 as the benchmark (setting thresholds of 70% for FMNIST, 50% for CIFAR-10, and 30% for CIFAR-100) and compares FLEX-SFL with representative split federated learning methods (SplitFed, SplitMix, FedRich, etc.) in terms of training rounds and cumulative running time, with results shown in
Table 4.
On FMNIST, FLEX-SFL reached 70% accuracy in only 3 communication rounds (90.9% fewer than SplitFed (33 rounds), 76.9% fewer than FedRich (13 rounds)) and took 10.27 s (86.4% shorter than SplitFed (75.57 s), 92.8% shorter than SplitMix (143.39 s), 56.1% shorter than FedRich (23.36 s)), with this advantage stemming from device-aware adaptive segmentation, which assigns lightweight submodels (e.g., the first two convolutional layers) to low-computing-power devices and complex layers to high-computing-power ones. In the CIFAR-10 task, it achieved 50% accuracy in 17 rounds (85.0% fewer than SplitFed (113 rounds), 74.2% fewer than FedRich (66 rounds)) and ran for 22.16 s (88.7% shorter than SplitFed (195.32 s), 96.4% shorter than SplitMix (623.43 s), 91.5% shorter than FedRich (262.13 s)), driven by entropy-driven client selection (screening representative clients via label entropy to reduce local deviation) and hierarchical asynchronous aggregation (avoiding full-synchronous communication blockages). On the high-complexity CIFAR-100, FLEX-SFL hit 30% accuracy in 118 rounds (87.4% fewer than SplitFed (938 rounds), 62.8% fewer than FedRich (317 rounds)) and took 243.33 s (86.3% shorter than SplitFed (1781.52 s), 71.0% shorter than SplitMix (836.51 s), 73.4% shorter than FedRich (913.26 s)), benefiting from edge cluster partitioning (reducing cross-cluster communication via intra-cluster aggregation) and server-side caching (reusing activation values to boost single-round effective computation by 10×).
5.3.3. Resource Consumption
- (1)
Theoretical Analysis
The FLEX-SFL framework is based on the split federated learning paradigm, offloading the computational and storage burdens of deep models to edge servers. Therefore, resource consumption only needs to consider the communication and computational overhead on the client side.
Communication overhead consists of two components: submodel parameter transmission and feature activation value transmission. Submodel Parameter Transmission Overhead: Clients upload gradients and download aggregated models every rounds (intra-cluster aggregation period). The single-client overhead per transmission is (where is the submodel proportion and is the size of the full model parameters). Thus, the total overhead for all clients in the system is , where is the average submodel proportion of the -th cluster, and is the number of clients in the cluster.
Feature Activation Value Transmission Overhead: In each local round, clients need to upload activation values. The single-client overhead is (where is the sampling rate, and is the feature dimension). The total overhead for all clients in a global round is .
Summing the two components, the total communication resource consumption in one global round is .
Computational overhead only considers the local training load on clients. Assuming the computation consumption for a client to train a full model is , and the computation amount for each local round is , the total computational overhead for all clients in one round is .
- (2)
Experimental Analysis
To evaluate the efficiency of FLEX-SFL in terms of resource usage, we conducted a comprehensive comparison with three representative split learning-based frameworks: SplitFed, SplitMix, and FedRich. All methods share consistent experimental configurations, and for FLEX-SFL, the local aggregation interval
and the global aggregation interval
were used, following the default setup in FedRich [
29], to ensure fairness.
Table 5 (upper half) presents the average communication and computation overhead per global round across three datasets. For communication cost, FedRich achieved the lowest transmission overhead due to its lightweight client–server interaction design, consuming only 0.76 MB per round on FMNIST. In contrast, FLEX-SFL incurred higher communication costs (e.g., 2.96 MB on FMNIST), approximately 3.9× that of FedRich. This increase stems from the HiLo-Agg architecture in FLEX-SFL, which introduces additional intra-cluster transmissions and repeated server-side computations to enable asynchronous training and mitigate straggler effects.
However, FLEX-SFL significantly reduces the client-side computation burden due to its dynamic segmentation (DAS) and edge offloading mechanism. On all datasets, FLEX-SFL exhibited the lowest per-round computation cost, e.g., 2.43 MFLOPs on FMNIST, outperforming FedRich (2.7 MFLOPs) and SplitFed (3.81 MFLOPs). This efficiency gain is primarily attributed to the adaptive submodel allocation, which assigns lighter computational loads to resource-constrained clients while delegating heavier components to edge servers.
The lower part of
Table 5 reports the total communication and computation cost required for each method to reach the predefined accuracy thresholds: 70% for FMNIST, 50% for CIFAR-10, and 30% for CIFAR-100.
Thanks to its faster convergence rate, FLEX-SFL achieved substantial savings in overall resource consumption. On FMNIST, it only required 8.81 MB in communication and 7.29 MFLOPs in computation to reach 70% accuracy, which are 10.1% and 79.2% lower than the requirements for FedRich, respectively. Similar trends were observed on CIFAR-10 and CIFAR-100. Although the per-round communication overhead of FLEX-SFL is higher, the reduced number of required training rounds significantly offsets this cost. For example, on CIFAR-100, FLEX-SFL completed the task in 118 rounds, whereas FedRich took 317 rounds, leading to a 48.5% reduction in total communication (3044.4 MB vs. 6041.7 MB estimated if scaled) and a 67.9% reduction in computation (3406.7 MFLOPs vs. 10616.3 MFLOPs).
These results demonstrate that FLEX-SFL, despite higher per-round transmission, achieved superior overall efficiency due to its enhanced convergence behavior and adaptive training mechanisms.