4.3. Main Results
Table 4 and
Figure 5 present a quantitative comparison on the DOTA-v1.0 dataset, where CONERSLite achieves a state-of-the-art mAP of 79.57% under the single-scale setting. This performance notably surpasses the best competing lightweight models including PKINet-S (+1.18%) and LSKNet-S (+2.08%), while maintaining extreme parameter efficiency (28.3 M params, 195 G FLOPs). This advantage stems from our Darwinian Wiring scheme: the Compact Anatomical Backbone (CAB) preserves critical geometric connectivity for structured categories like Bridges and Harbors, while the Functional Connectome Router (FCR) dynamically suppresses background interference in complex scenes such as Roundabouts and Soccer Fields through instance-specific sparse routing. CONERSLite still underperforms on several categories, notably Bridge (BR, 64.40%), Helicopter (HC, 64.90%), Soccer Ball Field (SBF, 69.10%), and Roundabout (RA, 73.90%). Bridge targets have extreme aspect ratios (often exceeding 1:20), and the low-rank bottleneck in the CAB compresses channel representations to
dimensions, which struggles to preserve the fine-grained linear structural continuity required for such slender targets. Helicopters are extremely small in DOTA imagery (often fewer than
pixels), and the rank-constrained manifold projection aggressively compresses already sparse feature representations below the discriminability threshold. Soccer Ball Fields have large regular geometry that is well captured by the CAB, but their visual similarity to other field-type categories (Ground Track Field, Basketball Court) causes the FCR to distribute weights across multiple assemblies rather than concentrating on a single expert, reducing classification confidence. Roundabouts exhibit near-circular geometry with complex internal structure, and the
DWConv kernels in the CAB have limited capacity to model curved boundaries, creating a geometric representation gap. These results reveal a trade-off in the CONERSLite design: the efficiency gained through rank-constrained projection and sparse expert selection comes at the cost of reduced representational capacity for targets with extreme geometries or very small spatial extent. Nonetheless, CONERSLite establishes a superior trade-off between perception and efficiency for resource-constrained remote sensing platforms.
To provide a direct comparison with existing lightweight detectors as well as a standard evaluation,
Table 5 reports the parameter count, FLOPs, and mAP@0.5 on DOTA-v1.0 for CONERSLite alongside representative lightweight baselines including EfficientDet-D0 [
15], YOLOv8n-obb and RTMDet-R-tiny [
71]. EfficientDet-D0 is a horizontal detector with 3.9 M parameters and 2.5 G FLOPs that achieves 33.8 AP on COCO; however, it lacks native oriented bounding box support and therefore cannot be directly evaluated on the DOTA OBB task without substantial architectural modification. YOLOv8n-obb (3.11 M params, 23.3 G FLOPs) achieves 78.0% mAP@0.5 on DOTA-v1.0 under multi-scale evaluation. RTMDet-R-tiny (4.88 M params, 20.45 G FLOPs) achieves 75.36% mAP@0.5 under single-scale and 79.82% under multi-scale settings. CONERSLite achieves 79.57% mAP@0.5 under single-scale with 28.3 M parameters and 195 G FLOPs. While CONERSLite has a higher parameter count, its accuracy substantially surpasses these lightweight baselines (+4.21% over RTMDet-R-tiny SS, +1.57% over YOLOv8n-obb MS). This demonstrates that the Darwinian Wiring paradigm provides a superior accuracy–efficiency trade-off for oriented remote sensing detection compared to directly scaling down general-purpose detectors.
Table 6 summarizes the quantitative performance on the ship centric HRSC2016 dataset. On this benchmark, CONERSLite achieves a state-of-the-art mAP of 98.62% (VOC 12). It is worth noting that while recent SOTA models have gradually approached the performance ceiling of this dataset, for instance, ReDet reports 97.63% and Oriented RCNN reports 97.80%, our method further optimizes feature selection and resource allocation to achieve a consistent improvement of approximately 1.0%. This result is in the same order of magnitude as current leading models, further validating the effectiveness of our framework. These results surpass established high-precision models such as ReDet and RTMDet while utilizing significantly fewer resources. Specifically, compared to RTMDet, our framework reduces the total parameter count by approximately 45%, which is a substantial leap in structural efficiency.
The superior accuracy on ship targets is directly attributed to the coupling of phylogenetic stability and functional plasticity. Ships in remote sensing imagery are characterized by extreme aspect ratios and precise geometric symmetries that are difficult to capture with generic lightweight filters. The Compact Anatomical Backbone solves this by enforcing constrained manifold flow that preserves the structural integrity of elongated targets during the feature extraction process. By selecting stable anatomical anchors optimized for oriented structures, the CAB provides a consistent geometric foundation for ship detection.
Complementing this structural foundation, the Functional Connectome Router enables dynamic somatic selection to handle the diverse orientations and scales of different vessels. In scenarios with complex harbor backgrounds or overlapping ship clusters, the FCR evaluates the utility of specialized neural experts in real time. It recruits a comprehensive coalition of assemblies to resolve fine-grained features only when the local information density requires it. This dynamic adaptation ensures that the model maintains high representational capacity for slender targets without incurring the metabolic cost of a dense heavyweight architecture. These findings demonstrate that Darwinian Wiring provides a robust solution for oriented detection on resource-constrained maritime monitoring platforms.
Table 7 summarizes the comparative results on the DIOR-R dataset which is characterized by high category diversity and complex background variations. CONERSLite achieves a state-of-the-art mAP of 69.68% which outperforms the strong PKINet-S baseline by a significant margin of 2.65%. This performance advantage is achieved while maintaining the lowest parameter count of 28.3 M among all competing models. It is noteworthy that CONERSLite FLOPs (195 G) is slightly higher than LSKNet S (161 G), but its parameter count is lower (28.3 M vs. 31.0 M). This phenomenon originates from the dynamic routing mechanism introduced by the FCR module: the routing network itself adds a secondary computational overhead, yet through sparse activation, the actual parameters involved in the active computation path are fewer than in static networks. Therefore, the moderate increase in FLOPs is an efficient trade-off for higher precision (69.68% vs. 65.90%), demonstrating the superior efficiency of dynamic resource allocation. The superior generalization of our framework across twenty different remote sensing categories demonstrates the effectiveness of the Darwinian Wiring mechanism.
In scenes with high intra class variability, the Compact Anatomical Backbone provides a stable structural manifold that captures the essential geometric properties of varied objects from airports to bridges. Unlike standard backbones that might overfit to specific textures, the CAB uses connectome constrained flow to prioritize stable anatomical features. Simultaneously, the Functional Connectome Router addresses the high entropy backgrounds prevalent in DIOR-R. In areas with significant background noise or distractors that resemble targets, the FCR dynamically evaluates the perceptual utility of neural experts. By suppressing irrelevant pathways and recruiting specialized assemblies only when target signals are detected, the system effectively increases its signal-to-noise ratio. This dynamic resource allocation is particularly effective for the diverse object scales found in DIOR-R. These findings confirm that the coupling of phylogenetic structural constraints and ontogenetic functional plasticity allows the model to maintain high representational capacity across diverse environmental conditions with minimal computational overhead.
4.4. Ablation Study
Core Component (CAB and FCR) Ablation Analysis: As demonstrated in
Table 8 and
Figure 6, the integration of the Compact Anatomical Backbone (CAB) and the Functional Connectome Router (FCR) is vital for the performance of CONERSLite. The full model achieves a mAP of 79.57% which represents a substantial improvement over all baseline configurations. When the FCR is removed, the mAP drops to 75.12% despite the presence of the CAB. This reduction occurs because the system loses its capacity for dynamic somatic selection and cannot reconfigure its functional connectome to match the input complexity. Specifically, in scenes with high informational density, a static CAB cannot recruit the necessary neural assemblies to enhance detection precision.
Similarly, replacing the CAB with a generic MobileNetV3 backbone while retaining the FCR leads to a decrease in mAP to 74.86%. This decline proves that the FCR requires a structured manifold with stable anatomical anchors to function effectively. A standard lightweight backbone lacks the geometric alignment properties needed for oriented remote sensing targets which prevents the FCR from selecting the optimal neural pathways. The worst performance of 71.39% mAP is observed when both modules are removed which reflects the limitations of traditional static lightweight architectures in handling the complexities of remote sensing imagery. These results confirm that the synergistic coupling of anatomical stability and functional plasticity is the primary source of the efficiency of the framework.
Multi-Scale Training Ablation Analysis: As demonstrated in
Table 9, the full CONERSLite model achieves a peak overall mAP of 82.35 and a small target AP of 75.32. When the FCR is removed, the overall mAP drop is accompanied by a severe 6.87% reduction in small target AP. This failure is directly linked to the absence of the fitness-driven competition mechanism. In our FCR implementation, the Boltzmann selection gate uses a temperature parameter to modulate the activation of parallel neural assemblies. For clustered small targets that exhibit high local entropy, the Multi-Layer Perceptron (MLP)-based fitness evaluator assigns higher utility to high-frequency experts. Without this competitive selection, the system remains in a static state and cannot concentrate its functional connectome on the fine-grained details necessary for small objects, which leads to significant perceptual loss in dense scenes.
Similarly, replacing the CAB with a generic backbone while keeping the FCR leads to a 7.41% drop in small target AP. This indicates that the FCR requires the specific structural constraints of the CAB to operate effectively under multi-scale training. The CAB utilizes a low-rank bottleneck projection defined by down-projection and up-projection weight matrices. This structural bottleneck forces the feature flow to pass through a narrow manifold which acts as a phylogenetic filter. This filtering process preserves stable anatomical anchors that represent the geometric core of oriented targets. A generic backbone lacks this rank constrained flow and provides a feature space where small target signals are easily obscured by background clutter during multi-scale scaling operations. These results prove that the synergistic coupling of manifold projection in the CAB and competitive selection in the FCR is the primary driver of high-precision detection for small targets.
FCR Design Detail Ablation Analysis: As shown in
Table 10, the full FCR achieves the best performance at 79.57% mAP, confirming its architectural effectiveness. Replacing GAP with LAP reduces mAP to 77.13%, showing global average pooling extracts context more effectively. Using a single neural assembly decreases mAP to 75.68%, proving multiple assemblies enhance representation. Standard Softmax lowers mAP to 78.05%, indicating temperature control optimizes sparse activation.
Neural Assembly Count () Ablation Analysis: As illustrated in
Table 11, the richness of the functional connectome, represented by the count of parallel neural assemblies
, plays a pivotal role in the adaptation capacity of the model. When
is set to low values such as 2 or 4, the mAP remains significantly lower at 76.23 or 78.45% respectively. This performance gap stems from the limited phenotypic plasticity of the expert pool. A restricted number of assemblies cannot cover the diverse geometric and textural niches present in remote sensing scenes where targets exhibit extreme scale variations and arbitrary orientations. In biological terms, a depauperate population of experts lacks the necessary functional diversity to respond effectively to high-complexity environmental stimuli.
With is increased to 6, CONERSLite achieves an optimal peak mAP of 79.57 while maintaining a high inference speed of 103 FPS. This configuration represents a critical equilibrium point where the plurality of neural experts is sufficient to map the manifold of oriented targets without introducing excessive metabolic overhead. At this scale, the replicator dynamics within the FCR can effectively differentiate between specialists for varied categories while the selection pressure ensures that only the most fit coalition is activated for a given instance.
However, further increasing the count to 8 or 10 results in diminishing returns, where mAP saturates at 79.63% and even slightly declines to 79.51%. This suggests that the informational entropy of the scene is already fully captured by six specialized assemblies. The additional pathways introduce redundant functional overlaps, where multiple assemblies compete for the same representational niche. When the count of assemblies increases from 6 to 8, FLOPs rise from 195 G to 210 G (adding 7.7%), while FPS drops from 103 to 92 (decreasing 10.7%). This discrepancy arises from the engineering implementation characteristics of dynamic routing. The increase in the number of assemblies not only brings a linear growth in multiplicative operations but also introduces additional memory access overhead and control flow complexity such as reduced parallelism in gating networks and increased cache miss rates. These factors cause the actual latency increase to be higher than the theoretical FLOPs increase. This phenomenon is widely observed in dynamic networks. From an accuracy perspective, only provides a marginal improvement of 0.06 mAP compared to , which is within the range of experimental noise. However, the associated latency cost of decreasing 10.7% FPS significantly impacts real-time applications. Therefore, considering mAP, speed, and computational efficiency, we select as the default configuration. For scenarios that are not sensitive to latency, can provide marginal accuracy gains.
Statistical Significance. To rigorously verify the robustness of the selected configuration, we conducted three independent training runs with different random seeds for the and settings. The results show that the mAP for is and for is . The difference of 0.06 mAP falls well within the overlapping confidence intervals of the two configurations, confirming that does not provide a statistically significant improvement over ( under paired t-test). This analysis rules out the possibility that the observed saturation is a result of random training variance and substantiates our selection of as the default configuration based on the efficiency–accuracy trade-off rather than marginal accuracy gains.
Temperature Parameter () Ablation Analysis: As demonstrated in
Table 12 and
Figure 7, the temperature parameter
acts as a critical physical lever for modulating the selection pressure within the Boltzmann selection gate of the FCR. This parameter dictates the sharpness of the probability distribution over parallel neural assemblies and directly influences the mean activation sparsity
and the collaborative dynamics of the functional connectome.
At a low temperature of , the selection mechanism enters a winner-take-all state characterized by extreme sparsity, where drops to 1.2. While this minimizes computational flow, the high selection pressure prevents the formation of synergistic coalitions. Under such conditions, the system is forced to rely on a single dominant assembly regardless of input complexity which leads to a significant degradation in mAP to 75.89% because multi-scale features cannot be integrated effectively.
Conversely, increasing to 1.0 results in a high entropy state with a mean activation count of 4.3. The loss of selection pressure leads to redundant activations where specialized signals are obscured by a uniform average of neural noise. This functional degradation occurs because the assemblies lose their competitive edge and transition from specialized experts back toward a generic uniform state, which reduces mAP to 78.65%.
Structural Compression Factor () Ablation Analysis: The structural compression factor
serves as the primary regulator of the phylogenetic filtering process within the CAB modules. As illustrated in
Table 13 and
Figure 8,
determines the dimensionality of the low-rank manifold that anchors the feature representation of oriented objects. When
is reduced to 0.5 which corresponds to a 50% pruning rate, the mAP suffers a catastrophic decline to 74.36%. This degradation occurs because the extreme structural bottleneck destroys the geometric integrity of the anatomical anchors. In this over-compressed state, the manifold dimension is insufficient to preserve the rotational symmetries and elongated axes critical for remote sensing targets, which leads to severe information collapse during the projection into the compressed latent space.
In contrast, the optimal setting of with a 30% pruning rate achieves a peak mAP of 79.57 while maintaining high parameter efficiency. This specific ratio ensures that the structural core of the backbone is robust enough to act as a stable phylogenetic foundation for the subsequent functional routing. At this equilibrium, the low-rank bottleneck effectively filters out redundant spatial noise while retaining a complete set of geometric primitives necessary for resolving complex object orientations. Further increasing to 0.9 yields only marginal improvements in mAP while significantly inflating the parameter count to 39.2 M and the computational cost to 281 G FLOPs. This plateau indicates that the essential structural manifold for remote sensing categories is relatively low rank and that additional parameters only contribute to representational redundancy without enhancing the perceptual gain. These results validate that the structural selection in CAB provides a lean yet powerful foundation for the Darwinian Wiring framework.
The sharp mAP transition from 74.36% () to 79.57% () reveals a critical threshold effect tied to the intrinsic dimensionality of the oriented object feature manifold. When falls below 0.6, the effective rank of the CAB weight matrices becomes insufficient to preserve the geometric properties (elongated axes and angular symmetries) essential for remote sensing targets, causing abrupt discriminability loss. Singular value spectrum analysis confirms this: at , the retained singular values capture over 95% of the total spectral energy, whereas at this ratio drops to approximately 78%. The subsequent plateau beyond indicates that the intrinsic feature dimensionality of typical remote sensing categories is relatively low, and additional capacity primarily captures redundant noise rather than discriminative signals.
CAB Internal Sub-Component Ablation Analysis: To further investigate the contribution of each structural element within the Compact Anatomical Backbone, we conduct an ablation on the internal sub-components of CAB, as reported in
Table 14. The full CAB employs a low-rank bottleneck projection with residual learning, depthwise spatial filtering, SiLU activation, and batch normalization. Replacing the bottleneck structure with standard
convolutions inflates the parameter count to 35.6 M and the computational cost to 252 G FLOPs, while mAP decreases to 78.91%. This confirms that the rank constraint enhances feature discriminability by forcing information flow through a low-dimensional manifold. Removing the residual connection causes a drop to 77.23% mAP, demonstrating that the skip pathway is necessary for preserving gradient flow and enabling stable feature propagation. Expanding the bottleneck ratio to 1.0 (removing low-rank projection) increases parameters to 33.1 M with loss to 79.14%, indicating that the feature manifold is intrinsically low rank. Replacing depthwise separable convolutions with standard convolutions yields a mAP (79.71%) at the cost of higher parameters (35.2 M) and FLOPs (267 G), confirming that depthwise filtering provides an efficient spatial encoding.
We further ablate two fundamental operators within the CAB block that directly correspond to the nonlinear activation function
in Equation (
15) and the normalization layer in the bottleneck path. Replacing SiLU with ReLU causes a 0.83% mAP decrease to 78.74%, despite identical parameter count and FLOPs. This performance gap is attributed to the smooth, self-gated property of SiLU (
), which preserves gradient continuity on the low-rank manifold during the constrained ODE integration. The ReLU hard zero-threshold introduces gradient discontinuities that disrupt the assumed flow field in Equation (
13), particularly in the compressed bottleneck space where feature magnitudes are inherently small. Replacing SiLU with GELU yields a 0.25% mAP decrease to 79.32%, suggesting that smooth activations in general are beneficial, but the SiLU explicit multiplicative gating provides a slight edge over the GELU probabilistic gating for oriented feature representations.
Removing BatchNorm from the bottleneck path causes the largest single-component degradation (2.91% mAP drop to 76.66%), surpassing the impact of removing the residual connection. This finding reveals that normalization plays a role in stabilizing the numerical integration of the constrained ODE: without BatchNorm, the feature magnitudes in the compressed r-dimensional space exhibit high variance across spatial locations, leading to gradient explosion during backpropagation and convergence instability. Replacing BatchNorm with GroupNorm () recovers most of the performance (79.21% mAP) but introduces a slight slowdown (97 FPS vs. 103 FPS) due to the per-group statistics computation overhead. These results confirm that normalization within the low-rank bottleneck is a structural necessity for maintaining the stability of the manifold flow.
4.5. Visualization and Qualitative Analysis
From the provided visual comparison results in
Figure 9, it can be clearly observed that the proposed method achieves significant improvements over the baseline approach. Across distinct representative scenarios, the heatmaps generated by the baseline method generally suffer from dispersed responses and insufficient focus on target regions, leading to blurred boundaries and background interference. In contrast, our method produces heatmaps with highly concentrated activations, which more precisely highlight the core areas of target objects.
The qualitative comparisons in
Figure 10 provide granular evidence of the effectiveness of the proposed Darwinian Wiring framework. In these magnified local views, the baseline model frequently exhibits two primary failure modes: “Miss Detection”, where small or partially occluded targets are overlooked in dense clusters, and “Wrong Detection”, where background clutter or shadows are misidentified as targets. These issues are systematically addressed in CONERSLite through our dual-stage refinement process.
Specifically, in the dense harbor scenarios shown in the top row, the Functional Connectome Router (FCR) dynamically orchestrates a sparse neural coalition that effectively suppresses interfering dock textures and water reflections. This instance-specific routing ensures that the model’s representational capacity is focused solely on the targets, thereby preventing the omissions observed in the baseline. Simultaneously, in high-density parking lot and warehouse scenes in the middle and bottom rows, the Compact Anatomical Backbone (CAB) utilizes connectome-constrained features selected through structural plasticity to prioritize geometric consistency over deceptive surface textures. By preserving only the most stable anatomical anchors during the pruning phase, the CAB facilitates precise boundary localization even for overlapping objects. This transition from structural selection in the CAB to functional focus in the FCR allows CONERSLite to achieve superior precision and reliability, successfully eliminating the classification errors and omissions that plague the uncompressed baseline.
This enhancement not only increases the visual salience of targets but also indicates that the model extracts target features more accurately, yielding more reliable and interpretable detection results. The orientations of detected bounding boxes are also more stable and consistent with the target geometry, demonstrating the effectiveness of the proposed Darwinian Wiring and Functional Connectome Routing (FCR) in managing complex spatial distributions.
Advantages of Module Complementarity on Complex Categories: The synergistic integration of CAB and FCR demonstrates pronounced advantages in detecting objects with complex structures or large scale variations. At the category level, the proposed model demonstrates significant improvements across challenging scenarios, underscoring CONERSLite’s effectiveness in small object modeling, rotational invariance, and background suppression. The angular equivariance modeling with the anatomical backbone and the functional routing strategy complement each other, with the former addressing angular consistency and the latter improving the robustness of small targets in complex scenarios.
It should be noted that the nonlinear relationship between FPS and FLOPs reflects the complexity of dynamic networks in actual deployment. Theoretical calculation optimization does not always translate linearly to inference speed gains. This leaves room for future engineering optimization such as operator fusion and sparse library adaptation.