Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression

Tang, Lixing; Zhong, Shaohong; Gao, Wentao; Liu, Jialang; Xie, Yuhang; Hu, Yaowen; Ma, Wanqi; Wei, Yingmei; Guo, Yanming

doi:10.3390/rs18111719

Open AccessArticle

Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression

by

Lixing Tang

^1,2,†

,

Shaohong Zhong

^1,†,

Wentao Gao

³,

Jialang Liu

⁴,

Yuhang Xie

⁵,

Yaowen Hu

⁴

,

Wanqi Ma

⁶,

Yingmei Wei

⁴

and

Yanming Guo

^4,*

¹

College of Computer Science and Mathematics, Central South University of Forestry and Technology, Changsha 410004, China

²

Xiangjiang Laboratory, Changsha 410073, China

³

School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

⁴

The Laboratory for Big Data and Decision, National University of Defense Technology (NUDT), Changsha 410004, China

⁵

Key Laboratory of Collaborative Intelligence Systems (Ministry of Education), School of Electronic Engineering, Xidian University, Xi’an 710071, China

⁶

School of Business, Jiangnan University, Wuxi 214122, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2026, 18(11), 1719; https://doi.org/10.3390/rs18111719

Submission received: 25 February 2026 / Revised: 7 April 2026 / Accepted: 8 April 2026 / Published: 27 May 2026

(This article belongs to the Special Issue Advanced Applications of Artificial Intelligence in Remote Sensing Image Recognition (2nd Edition))

Download

Browse Figures

Review Reports Versions Notes

Highlights

What are the main findings?

We propose the Darwinian Wiring framework which redefines lightweight detection as a dynamic adaptation process where the coupling of structural stability in the CAB and functional plasticity in the FCR enables high precision with minimal resources.
We achieve breakthrough performance across multiple benchmarks: 79.57% mAP on DOTA v1.0 (82.35% under multi-scale settings), 98.62% mAP on HRSC2016, and 69.68% mAP on DIOR R, all while utilizing only 28.3 M parameters.

What is the implication of the main finding?

The CONERSLite framework provides a scalable, energy-efficient solution for oriented object detection on edge platforms by breaking the structural complexity and perceptual accuracy trade-off via instance-specific resource allocation.

Abstract

The deployment of lightweight object detectors on remote sensing edge platforms is severely constrained by the rigid trade-off between perception capacity and metabolic expenditure. To solve this fundamental challenge, we draw inspiration from the superior energy efficiency of the mammalian brain and the principles of connectomics to introduce CONERSLite. By emulating the dual mode synergy of biological neural systems, CONERSLite integrates a Compact Anatomical Backbone (CAB) representing the stable anatomical connectome and a Functional Connectome Router (FCR) that mimics the plasticity of the functional connectome. Our framework achieves a peak mAP of 82.35% on the DOTA-v1.0 dataset with only 28.3 M parameters and 195 G FLOPs, effectively establishing a new accuracy–efficiency Pareto frontier for remote sensing. On the HRSC2016 dataset, it reaches a state-of-the-art mAP of 98.62% while reducing the total parameter count by approximately 45% compared to high-precision optimized models like RTMDet. These results demonstrate that the application of connectomics principles provides a biologically grounded and highly efficient solution for resource-constrained remote sensing object detection.

Keywords:

machine learning; model compression; neural Darwinism; structural plasticity; object detection; remote sensing

1. Introduction

Object detection in remote sensing imagery has become a vital component of modern earth observation which provides critical insights for applications such as environmental monitoring and urban planning. The deployment of high-resolution sensors on satellites and unmanned aerial vehicles has shifted the demand for processing pipelines from centralized ground stations to edge hardware. This transition toward onboard real-time inference is essential for reducing data transmission latency and optimizing bandwidth. However the deployment of sophisticated deep learning models on remote platforms remains a difficult task due to the intrinsic characteristics of remote sensing data and the severe power management limitations of edge devices.

The development of high-performance yet hardware-compatible detectors is currently obstructed by two fundamental challenges.

Lightweighting of remote sensing target detection: Remote sensing images often contain expansive and featureless backgrounds like open sea or deserts where critical targets occupy a negligible fraction of the total area. Traditional detectors operate with a fixed computational budget and process every spatial region with identical intensity regardless of the information density. This results in massive metabolic redundancy where the system expends nearly all its energy on non-informative voids rather than focusing on vital targets.
The improvement of detection accuracy specifically within lightweight models: Remote sensing targets exhibit extreme variations in scale as well as arbitrary rotational orientations and are often located in high-clutter environments. While heavy architectures can achieve high precision through deep feature hierarchies, they are incompatible with the strict size and power envelopes of edge platforms. The conflict between the need for high-fidelity representation and the imperative for physical lightweighting stands as a central obstacle.

In response to these challenges, the academic community has developed various lightweight designs. Researchers introduced architectures such as the MobileNet and ShuffleNet series [1,2]. MobileNet employs depthwise separable convolutions to reduce the total number of operations. This structure separates spatial filters from channel mixing into two simple steps. ShuffleNet uses pointwise group convolutions to limit the connections between feature channels. This design also includes channel shuffling to maintain information flow across groups. Both architectures effectively decrease the physical size and complexity of the model.

However, these models rely on a fixed computational logic during inference. The operations in MobileNet and ShuffleNet apply to every spatial region in a uniform manner. This static approach prevents the model from reducing its effort in simple background areas. Such architectures always execute the same sequence of calculations regardless of the scene difficulty. This rigidity limits the potential for energy savings in remote sensing applications. Recent designs also consider geometric alignment for rotated targets. But these models still follow a constant computational schedule for every image. Quantitatively, on the DOTA-v1.0 benchmark, representative lightweight backbones such as MobileNetV3 and ShuffleNetV2, when paired with Oriented RCNN, typically achieve mAP values in the range of 68–72% with 20–25 M parameters and 150–180 G FLOPs. While these models are significantly smaller than heavy architectures like ResNet-101-based detectors (41–55 M parameters, 200–340 G FLOPs, 74–76% mAP), their static computational allocation leads to substantial resource waste in remote sensing imagery where informative target regions often occupy less than 5% of the total image area. This efficiency gap motivates our dynamic resource allocation paradigm.

To overcome these obstacles, we seek a paradigm shift by looking to the mammalian brain, as nature is the most optimized computational engine. Neuroscience research on the theory of neuronal group selection suggests that the superior energy efficiency of the brain is derived from a selection-based adaptation mechanism. Specifically the brain manages its energy budget through the synergy of a dual mode system consisting of the anatomical connectome and the functional connectome. The anatomical connectome acts as a stable substrate designed for structural parsimony, while the functional connectome exhibits profound plasticity. It enables the brain to rapidly reconfigure its neural pathways in milliseconds forming temporary coalitions of neural assemblies that are precisely tailored to the immediate task. Under this selective framework, high-cost computational resources are only mobilized when the stimulus complexity exceeds a threshold which effectively matches metabolic expenditure to informational demand.

Drawing upon these connectomics principles, we develop the Darwinian Wiring framework that transforms model compression from a static pruning exercise into a dynamic adaptation process. We implement this paradigm in the CONERSLite architecture through two synergistically coupled modules. The Compact Anatomical Backbone emulates the phylogenetic substrate of the brain to provide a minimalist structural foundation. It delivers the essential multi-scale feature representations required for oriented detection while maintaining a negligible baseline metabolic footprint. The Functional Connectome Router introduces ontogenetic plasticity as a real-time arbiter that emulates somatic selection. It oversees a competition among parallel neural pathways and recruits a comprehensive coalition of specialized neural experts only when informational peaks are detected.

As illustrated in Figure 1, we provide a quantitative comparison between CONERSLite and several representative state-of-the-art models in terms of mean Average Precision (mAP) and computational complexity (FLOPs). While traditional detectors often occupy the high-precision high-cost or low-cost low-precision regions of the landscape, our model establishes a new Pareto frontier. Specifically, CONERSLite achieves a peak mAP of 79.57% while maintaining a minimalist computational footprint of 195 GFLOPs. This performance significantly surpasses both established heavy architectures and optimized lightweight baselines. This efficiency proves that our Darwinian Wiring framework allows the system to extract high-fidelity geometric features without the overhead associated with static dense connectivity. By locating itself in the upper left corner of the efficiency map, CONERSLite demonstrates its superior suitability for deployment on resource-constrained remote sensing platforms where energy economy and detection reliability must be maximized simultaneously.

Our contributions are summarized below.

We redefine lightweight remote sensing detection as a dynamic adaptation process grounded in neural Darwinism where the model achieves extreme efficiency by allocating computational resources only where high informational density exists.
We introduce the CONERSLite framework which achieves a superior accuracy–efficiency trade-off through its dual stage selection mechanism comprising structural manifold filtering in the backbone and functional expert routing in the neck.
We provide empirical evidence that CONERSLite establishes a new performance benchmark by achieving a peak mAP of 82.35% on the DOTA dataset with only 28.3 M parameters which represents a significant reduction in model complexity compared to current state-of-the-art oriented detectors.

This work belongs to the exploratory application of dynamic neural networks in the field of remote sensing detection and forms a complementary rather than alternative relationship with static lightweight models.

2. Related Work

2.1. Lightweight Object Detection

Lightweight models such as MobileNet [1], ShuffleNet [2] and GhostNet [3] reduce computational cost through depthwise separable convolutions, group convolutions with channel shuffling and inexpensive linear transformations respectively. These static designs apply uniform computation to all input regions regardless of scene complexity, which wastes resources on background-dominated remote sensing imagery. Dynamic neural networks [4] address this through early exit [5], conditional computation [6] and dynamic channel pruning [7] to adaptively adjust inference paths per input.

In remote sensing detection, LO-Det [8] separates orientation sensitive components for oriented targets, SEMA-YOLO and MSF-SNET [9] enhance semantic features and multi-scale fusion, and Zhan et al. [10] and Zhong et al. [11] introduce binary pixel difference operators and adaptive deformation learning respectively. Strip R-CNN [12] aggregates context via large strip convolutions, SM3Det [13] employs a unified multimodal architecture, and LayerLink [14] bridges detectors with large vision models through efficient fine tuning. EfficientDet [15] jointly optimizes model dimensions using weighted bidirectional feature pyramids, while ghost convolution hierarchical graphs address dense objects [16] and RTMPose [17] demonstrates extreme architectural streamlining for mobile deployment. Bio-inspired methods such as BRSTD [18], SCM-YOLO [19,20] and ECDet [21] introduce center surround inhibition, spatial reweighting and reparameterized transformers for small target detection. Liu et al. [22,23] further propose geometric aware and weather adaptive detectors for UAV-based road scenes. These designs reduce the network footprint but follow constant computational schedules regardless of local informational density.

Dynamic approaches such as Dynamic DETR [24] utilize dynamic attention for rapid convergence, while dynamic zoom in networks [25] selectively processes high-reward regions via reinforcement learning. Neural Architecture Search further automates design optimization through strategies like NAS-FCOS [26] and structural to modular NAS [27], which dominate manually designed networks in efficiency. CONERSLite introduces a competitive mechanism among parallel neural pathways through the principles of neuronal group selection. It evaluates the perceptual gain of different neural assemblies and recruits expert coalitions only when the input complexity is high. This dual mode framework ensures high functional capacity for important objects while keeping the baseline metabolic cost low.

2.2. Brain-Inspired and Connectomics-Inspired Methods in Object Detection

Drawing inspiration from the biological visual system has become a fruitful technical route for achieving high efficiency and robustness in deep learning [28]. Early research primarily focused on the functional mimicry of specialized brain mechanisms such as attention and context-sensitive perception. For instance, designs like the Squeeze-and-Excitation [29] block and the Convolutional Block Attention Module [30] enable networks to dynamically focus on key information by reweighting feature channels or spatial locations. Similarly, the Non-Maximum Suppression [31] algorithm operates on a principle highly analogous to the lateral inhibition phenomenon where active neurons suppress their neighbors to sharpen spatial perception. However, the metabolism–accuracy conflict in remote sensing demands a deeper integration of biological principles, leading to research trajectories focusing on sparse pulse communication, recursive foveal processing, and structural connectomics.

Event-based pulse computation simulates the energy-efficient communication of biological neurons through discrete spikes. Representative methods such as Spiking YOLO [32] and EMS-YOLO [33] convert or train detection models using integrate-and-fire neuron models, achieving extreme power efficiency through sparse, event-driven processing. However, these SNN-based approaches still face challenges in training stability and precision gaps compared to dense ANNs, particularly for the fine-grained geometric alignment required in oriented object detection.

Connectomics-inspired methods shift the focus from functional mimicry to the engineering of structural connectivity patterns observed in biological brains. BioNIC [34] and Deep Connectomics Networks [35] leverage principles from mammalian connectomes to design more efficient neural architectures that balance local processing with long-range integration. Similarly, CORnet-S [36] explicitly mimics the recurrent processing and hierarchical structure of the primate ventral stream to achieve more robust object recognition. For complex object detection tasks, the Structure Inference Net (SIN) [37] incorporates scene-level context and instance relationships to resolve spatial ambiguities, mimicking the top–down relational reasoning of the human brain. In the hyperspectral domain, DGPF-RENet [38] demonstrates that low-data-dependence architectures with reduced training iterations can achieve competitive classification accuracy, highlighting the potential of efficient structural design beyond object detection.

Attentional focus and selective foveation draw inspiration from the human ability to prioritize relevant visual stimuli through narrow-field glimpses. The OCRA network combines recurrent glimpse-based attention with object-centric representation learning to perform multi-object reasoning [39]. Li et al. [40] propose a sorted texture-aware glance-and-gaze network that mimics the coarse-to-fine visual perception process for hyperspectral image classification under limited training samples, demonstrating the effectiveness of biologically inspired attentional hierarchies. In the domain of remote sensing, BRSTD leverages biological principles such as center-surround inhibition and contrast sensitivity functions to enhance tiny target detection in high-clutter backgrounds [18]. Beyond isolated functions, macroscopic brain-inspired architectures map collective neural dynamics onto specialized hardware combined with dynamics-aware quantization to achieve multi fold acceleration [41] and broader biological computing feasibility [42]. For medical scenarios, lightweight frameworks introduce neural plasticity and rhythm-regulated sampling to maintain robustness under severe constraints [43]. While effective, these mechanisms primarily adjust feature weights without modifying the underlying structural connectivity. In contrast, CONERSLite introduces a dynamic selection mechanism inspired by the competition of neural assemblies, allowing for real-time reconfiguration of the functional connectome to achieve optimal resource allocation.

Relevant work summaries are presented in Table 1.

3. Methodology

We reconceptualize the inference process of CONERSLite not merely as a static feed-forward computation but as a dynamic, continuous-time biological adaptation process. Grounded in Edelman’s Theory of Neuronal Group Selection (TNGS)—which posits that functional circuitry arises from competitive selection rather than fixed instruction—we model the network state evolution as the synergistic interplay between two distinct evolutionary forces. First, Phylogenetic Constraints, embodied by the Compact Anatomical Backbone (CAB), provide a stable, genetically conserved structural foundation optimized for metabolic efficiency. Second, Ontogenetic Selection, facilitated by the Functional Connectome Router (FCR), introduces somatic plasticity, acting as a real-time arbiter that dynamically recruits neural assemblies in response to immediate environmental complexity.

To mathematically instantiate this biological paradigm into a computable deep learning topology, we instantiate the abstract concept of somatic functional groups into concrete computational units termed neural assemblies. In our framework, a neural assembly

A_{i}

(

i = 1, \dots, N_{a s m}

) is a self-contained expert sub-network that operates as an independent feature processing unit within the FCR. Concretely, each assembly consists of a depthwise separable convolutional block (DWConv

3 \times 3

→ BatchNorm → SiLU) for spatial feature extraction, followed by a pointwise convolution (Conv

1 \times 1

) for channel mixing. All assemblies share the same input feature map

x \in R^{C \times H \times W}

and produce output feature maps of identical dimensions

y_{i} \in R^{C \times H \times W}

, but each learns distinct convolutional kernels, thereby specializing in different feature patterns (e.g., edge orientations, texture frequencies, or scale ranges). The final output of the FCR is calculated via the context-aware weighted aggregation of all assemblies

y = \sum_{i = 1}^{N_{a s m}} g_{i}^{*} \cdot y_{i}

(1)

where

g_{i}^{*}

denotes the dynamic Boltzmann selection weights. This architecture ensures that each assembly functions as a specialized “neural expert”, while the competitive selection mechanism determines which experts are actively recruited for a given input instance.

Table 2 provides an explicit mapping between the biological principles that motivate our design and their corresponding engineering implementations.

Figure 2 presents the overall CONERSLite architecture and shows how CAB and FCR cooperate within the oriented detection pipeline.

To provide an intuitive overview of the inference logic, we present the pseudocode for the complete CONERSLite forward pass in Table 3.

3.1. Fitness Landscape and Selection Pressure

We redefine the perceptual gain

U_{i}

by shifting from a simple energy metric to an information gain perspective. Unlike global measures, we define it as the information gain acquired through neural assembly activation, representing the reduction in uncertainty for target relevant features. Theoretically, this is quantified by the Kullback–Leibler (KL) divergence between target and background distributions:

U_{i} (x) = E_{p_{t a r g e t}} [log \frac{f_{i} {(x)}_{t a r g e t}}{f_{i} {(x)}_{b a c k g r o u n d}}]

(2)

where

f_{i} (x)

is the output feature map and

p_{t a r g e t}

is the estimated distribution of target regions. In our implementation, we employ a differentiable approximation using a spatial weight mask

M

:

U_{i} (x) = \frac{∥ M ⊙ f_{i} {(x) ∥}_{2}^{2}}{∥ f_{i} {(x) ∥}_{2}^{2} + ϵ}

(3)

where

M

highlights potential target regions and

ϵ

is a small positive constant (

10^{- 6}

) to prevent division by zero. This formulation captures the selective amplification and Signal to Noise Ratio (SNR) enhancement observed in biological visual systems.

Figure 3 illustrates the internal structure of FCR and the sparse activation process used to recruit neural assemblies according to target complexity.

The instantaneous fitness

Φ_{i} (x, Ω)

is defined as the trade-off between the perceptual gain and the dynamic metabolic cost

C_{i} (x)

. Unlike static parameter counts, we define

C_{i}

as the actual computational resources consumed by the assembly, controlled by a learnable binary gate

g_{i} \in {0, 1}

:

C_{i} (x) = g_{i} \cdot \frac{{FLOPs}_{i}}{max FLOPs} + β \cdot KL (g_{i} ∥ ρ)

(4)

where

g_{i} = GumbelSoftmax (ϕ_{i} (x))

implements differentiable sampling,

ϕ_{i} (x)

denotes the logit score produced by the fitness evaluation MLP for the i-th assembly,

{FLOPs}_{i}

is the floating-point operation count of assembly i,

β

is a weighting coefficient (set to 0.01) for the sparsity regularization term, and

ρ

is the target activation probability (set to 0.5). The first term represents the normalized computational overhead when the assembly is activated, while the second term encourages parsimony via KL divergence. Biologically, this simulates the metabolic cost of action potential generation and synaptic transmission, where

g_{i}

represents the firing decision and the KL penalty reflects energy constraints.

The final fitness value

Φ_{i} (x)

integrates these terms to determine the activation probability of each assembly:

Φ_{i} (x) = U_{i} (x) - λ \cdot C_{i} (x)

(5)

where

λ

is a hyperparameter that balances the perceptual rewards against metabolic expenditures. To quantify the overall sparsity of the functional connectome, we define the mean activation sparsity

S (G)

(dimensionless) as the expected number of activated neural assemblies per input instance:

S (G) = E_{x} [\sum_{i = 1}^{N_{a s m}} g_{i} (x)]

(6)

where

g_{i} (x) \in {0, 1}

is the binary selection state. This metric provides a normalized measure of the system’s metabolic load, with

S (G) = 1

representing minimal single-assembly activation and

S (G) = N_{a s m}

representing full assembly recruitment.

The temporal evolution of the activation levels is governed by the selection pressure which drives the system toward an optimal state. We model the change in the activation distribution

g_{i} (t)

using the continuous time replicator equation. This equation describes how the activation of each assembly grows or shrinks based on its relative fitness compared to the population average:

\frac{\partial g_{i} (t)}{\partial t} = g_{i} (t) (Φ_{i} (x) - \bar{Φ} (x, g (t)))

(7)

This equation provides a mathematical abstraction of group selection within biological evolution, where the proportion of each assembly

g_{i} (t)

grows at a rate proportional to its relative fitness advantage over the population average.

In this setup, the variable

\bar{Φ} (x, g (t)) = \sum_{j = 1}^{N_{a s m}} g_{j} (t) Φ_{j} (x, Ω)

represents the mean fitness of the entire population. According to the dynamics defined in the replicator equation, the average fitness of the neural assemblies is always non-decreasing over time. This property satisfies the fundamental theorem of natural selection which guarantees that the system will move toward higher representational utility.

Proof.

The continuous improvement of the system can be verified by differentiating the mean fitness

\bar{Φ}

with respect to time. The derivation shows that the rate of change is equal to the variance of the fitness within the population.

\begin{matrix} \frac{d \bar{Φ}}{d t} & = \sum_{i = 1}^{N_{a s m}} \frac{\partial g_{i}}{\partial t} Φ_{i} = \sum_{i = 1}^{N_{a s m}} g_{i} (Φ_{i} - \bar{Φ}) Φ_{i} \\ = \sum_{i = 1}^{N_{a s m}} g_{i} Φ_{i}^{2} - \bar{Φ} \sum_{i = 1}^{N_{a s m}} g_{i} Φ_{i} \\ = E_{g} [Φ^{2}] - {(E_{g} [Φ])}^{2} = {Var}_{g} (Φ) \geq 0 \end{matrix}

(8)

This implies that the Functional Connectome Router naturally evolves toward a state of higher efficiency. The speed of this adaptation is directly proportional to the diversity of the experts in the pool. □

To find the stable equilibrium state, we reformulate the selection as an exploration exploitation trade-off. The system objective is to find a distribution

g

that maximizes expected fitness while preserving sufficient diversity (entropy) to handle environmental uncertainty. This is equivalent to minimizing the free energy

F

of the system under temperature

τ

:

min_{g \in Δ} F (g) = min_{g \in Δ} \{- E_{g} [Φ (x)] - τ H (g)\}

(9)

In this equation, the term

H (g) = - \sum g_{i} ln g_{i}

represents the informational entropy. We solve this constrained optimization problem by using the method of Lagrange multipliers to incorporate the requirement that the probabilities must sum to one. The Lagrangian

L

for this system is expressed below:

\begin{matrix} L (g, γ) & = - \sum_{i = 1}^{N_{a s m}} g_{i} Φ_{i} - τ (- \sum_{i = 1}^{N_{a s m}} g_{i} ln g_{i}) + γ (\sum_{i = 1}^{N_{a s m}} g_{i} - 1) \end{matrix}

(10)

By taking the derivative of the Lagrangian with respect to each activation level and setting the result to zero, we can identify the optimal distribution:

\frac{\partial L}{\partial g_{i}} = - Φ_{i} + τ (ln g_{i} + 1) + γ = 0

(11)

\begin{matrix} ln g_{i} & = \frac{Φ_{i} - γ}{τ} - 1 \Rightarrow g_{i}^{*} = \frac{exp (Φ_{i} / τ)}{Z}, Z = \sum_{j} exp (Φ_{j} / τ) \end{matrix}

(12)

This derivation proves that the routing logic is the natural equilibrium state of an entropic selection process. Here, the temperature

τ

controls the sharpness of selection: as

τ \to 0

, the system approaches a winner-take-all exploitation state, while as

τ \to \infty

, it converges to a uniform distribution for maximum exploration.

Scope and Interpretation of Biological Analogies. The concepts of fitness, replicator dynamics, and neuronal group selection in this work serve as principled design heuristics for dynamic resource allocation, rather than exact reproductions of biological processes. The fitness function

Φ_{i}

(Equation (5)) is an engineering approximation whose hyperparameter

λ

is determined empirically, and the replicator dynamics (Equation (7)) motivate the architecture design theoretically, while the actual training relies on standard backpropagation with discrete forward passes. Nevertheless, these biological principles provide valuable inductive biases: competition-based routing encourages assembly specialization, and metabolic cost penalties promote computational parsimony, both empirically validated in our ablation studies (Section 4.3).

3.2. Phylogenetic Constraints: Constrained Manifold Flow in CAB

While the Functional Connectome Router handles the dynamic selection of pathways, the Compact Anatomical Backbone enforces the necessary phylogenetic constraints that ensure structural efficiency. We model the process of feature extraction as the evolution of a feature state

h (l) \in R^{C}

(where C is the channel dimension) governed by a continuous flow field. This dynamics can be described as a neural ordinary differential equation that evolves on a Riemannian manifold

M

. The critical innovation in our framework is that this flow is strictly subject to a sparsity projection operator

P_{K}

. This operator represents the hard limits on the connectivity of the network, which ensures that the information processing remains metabolically cheap:

\frac{d h (l)}{d l} = P_{K} [f_{θ} (h (l), l)]

(13)

where

f_{θ}

denotes the parameterized transformation function (i.e., the convolutional block with learnable parameters

θ

) and l indexes the network depth.

The operator

P_{K}

works by projecting the high-dimensional gradient flow onto a low-rank manifold

K_{ρ}

, defined as

K_{ρ} = {W ∣ rank (W) \leq ⌊ ρ \cdot C ⌋}

, where

ρ \in (0, 1]

is the structural compression factor, and C is the input channel dimension. Mathematically, we can understand this projection by considering the singular value decomposition of the weight matrix

W (l)

as

U Σ V^{⊤}

. The projection operator retains only the top

k = ⌊ ρ C ⌋

singular values, which forces the information flow to pass through a narrow channel:

P_{K} (W) = \sum_{j = 1}^{⌊ ρ C ⌋} σ_{j} u_{j} v_{j}^{⊤}

(14)

where

σ_{j}

denotes the j-th singular value in descending order, and

u_{j}

,

v_{j}

are the corresponding left and right singular vectors, respectively.

This operator

P_{K}

provides a mathematical representation of the physical constraints within the biological brain, such as axonal diameter limitations and the degree of myelination. These biophysical factors create an information bottleneck that forces neural signals to traverse low-dimensional manifolds, thereby achieving structural parsimony and extreme metabolic efficiency.

Figure 4 shows the CAB design and the low-rank bottleneck that preserves essential spatial filtering under structural compression.

Discrete Approximation and Residual Learning

Residual bottleneck blocks are classic structures widely used in deep learning. This work does not claim to have invented this structure but instead provides a new theoretical explanation from the perspective of connectomics. We model feature extraction as a continuous evolution process on a low-rank manifold where the discrete approximation naturally leads to the residual bottleneck form. This perspective reveals the principle of rank constraint behind the bottleneck. It is essentially the optimal numerical implementation for dimension compression of information flow which mirrors the information bottleneck caused by axonal volume limitations in the biological nervous system:

h_{k + 1} = h_{k} + W_{u p} \cdot σ (W_{d o w n} \cdot h_{k})

(15)

where

W_{d o w n} \in R^{C \times r}

and

W_{u p} \in R^{r \times C}

are the down-projection and up-projection weight matrices with bottleneck dimension

r = ⌊ ρ \cdot C ⌋ ≪ C

, and

σ (\cdot)

denotes the nonlinear activation function (SiLU in our implementation).

The significance of this theoretical framework is three fold. First, it provides a biologically inspired rational explanation for the effectiveness of existing structures. Second, it inspires us to further explore learnable rank selection mechanisms. Third, it establishes an explicit link between structural design and resource constraints, which provides new ideas for the rational design of lightweight backbones in the future.

4. Experiments

4.1. Datasets

We evaluate our method on three challenging oriented object detection benchmarks. DOTA v1.0 comprises 2806 high-resolution images (up to

4000 \times 4000

) and 188,282 instances across 15 categories, serving as a primary benchmark for multi-scale detection. HRSC2016 is a dedicated ship detection dataset containing 1061 images and 2976 instances, distinct for its elongated targets with extreme aspect ratios. DIOR-R extends the DIOR dataset with oriented annotations, featuring 23,463 images and approximately 190k instances across 20 categories.

4.2. Experimental Settings

For the oriented object detection task on the DOTA-v1.0 dataset, all our experiments were conducted on a platform equipped with four RTX 4090 GPUs. During the training process, we set the batch size to 8 and an initial learning rate to 5.0 × 10⁻⁵, and trained the model for 12 epochs. To ensure a rigorous and fair comparison, all baseline results are cited directly from their original publications or official reports, and the evaluation protocols (specifically the VOC 12 metric) were maintained consistently across all models. For methods such as LSKNet and PKINet that primarily report backbone level performance, we integrated their official code with the default detection head (Oriented RCNN) and re-evaluated them under identical training settings to obtain full detector metrics. All comparisons were performed on a unified hardware platform using the same evaluation protocol to ensure baseline consistency. CONERSLite achieves a state-of-the-art 98.62% mAP through strictly validated experimental procedures. Following the standard evaluation protocol established by the DOTA benchmark, all results are reported using the VOC 2012 metric (mAP at IoU = 0.5), which is the universally adopted metric in the oriented object detection community and ensures direct comparability with prior work. To ensure a fair comparison with other mainstream methods, we adopted their standard data preprocessing protocols and utilized a single-scale strategy for both the training and testing phases. Terminology Note. To avoid ambiguity, we clarify the use of “baseline” in this paper. MTP [44] serves as the base detector for the ablation study (Section 4.3). All ablation variants are constructed by incrementally adding proposed modules onto MTP: “MTP + CAB” denotes MTP augmented with the Compact Anatomical Backbone; “MTP + FCR” denotes MTP augmented with the Functional Connectome Router; and “CONERSLite (Full)” denotes MTP augmented with both CAB and FCR. This additive design allows us to isolate the contribution of each module independently. All FPS measurements are obtained on a single RTX 4090 GPU with a batch size of 1 and input resolution of

1024 \times 1024

, averaged over 1000 iterations after a 200-iteration warm-up.

4.3. Main Results

Table 4 and Figure 5 present a quantitative comparison on the DOTA-v1.0 dataset, where CONERSLite achieves a state-of-the-art mAP of 79.57% under the single-scale setting. This performance notably surpasses the best competing lightweight models including PKINet-S (+1.18%) and LSKNet-S (+2.08%), while maintaining extreme parameter efficiency (28.3 M params, 195 G FLOPs). This advantage stems from our Darwinian Wiring scheme: the Compact Anatomical Backbone (CAB) preserves critical geometric connectivity for structured categories like Bridges and Harbors, while the Functional Connectome Router (FCR) dynamically suppresses background interference in complex scenes such as Roundabouts and Soccer Fields through instance-specific sparse routing. CONERSLite still underperforms on several categories, notably Bridge (BR, 64.40%), Helicopter (HC, 64.90%), Soccer Ball Field (SBF, 69.10%), and Roundabout (RA, 73.90%). Bridge targets have extreme aspect ratios (often exceeding 1:20), and the low-rank bottleneck in the CAB compresses channel representations to

r = ⌊ ρ \cdot C ⌋

dimensions, which struggles to preserve the fine-grained linear structural continuity required for such slender targets. Helicopters are extremely small in DOTA imagery (often fewer than

15 \times 15

pixels), and the rank-constrained manifold projection aggressively compresses already sparse feature representations below the discriminability threshold. Soccer Ball Fields have large regular geometry that is well captured by the CAB, but their visual similarity to other field-type categories (Ground Track Field, Basketball Court) causes the FCR to distribute weights across multiple assemblies rather than concentrating on a single expert, reducing classification confidence. Roundabouts exhibit near-circular geometry with complex internal structure, and the

3 \times 3

DWConv kernels in the CAB have limited capacity to model curved boundaries, creating a geometric representation gap. These results reveal a trade-off in the CONERSLite design: the efficiency gained through rank-constrained projection and sparse expert selection comes at the cost of reduced representational capacity for targets with extreme geometries or very small spatial extent. Nonetheless, CONERSLite establishes a superior trade-off between perception and efficiency for resource-constrained remote sensing platforms.

To provide a direct comparison with existing lightweight detectors as well as a standard evaluation, Table 5 reports the parameter count, FLOPs, and mAP@0.5 on DOTA-v1.0 for CONERSLite alongside representative lightweight baselines including EfficientDet-D0 [15], YOLOv8n-obb and RTMDet-R-tiny [71]. EfficientDet-D0 is a horizontal detector with 3.9 M parameters and 2.5 G FLOPs that achieves 33.8 AP on COCO; however, it lacks native oriented bounding box support and therefore cannot be directly evaluated on the DOTA OBB task without substantial architectural modification. YOLOv8n-obb (3.11 M params, 23.3 G FLOPs) achieves 78.0% mAP@0.5 on DOTA-v1.0 under multi-scale evaluation. RTMDet-R-tiny (4.88 M params, 20.45 G FLOPs) achieves 75.36% mAP@0.5 under single-scale and 79.82% under multi-scale settings. CONERSLite achieves 79.57% mAP@0.5 under single-scale with 28.3 M parameters and 195 G FLOPs. While CONERSLite has a higher parameter count, its accuracy substantially surpasses these lightweight baselines (+4.21% over RTMDet-R-tiny SS, +1.57% over YOLOv8n-obb MS). This demonstrates that the Darwinian Wiring paradigm provides a superior accuracy–efficiency trade-off for oriented remote sensing detection compared to directly scaling down general-purpose detectors.

Table 6 summarizes the quantitative performance on the ship centric HRSC2016 dataset. On this benchmark, CONERSLite achieves a state-of-the-art mAP of 98.62% (VOC 12). It is worth noting that while recent SOTA models have gradually approached the performance ceiling of this dataset, for instance, ReDet reports 97.63% and Oriented RCNN reports 97.80%, our method further optimizes feature selection and resource allocation to achieve a consistent improvement of approximately 1.0%. This result is in the same order of magnitude as current leading models, further validating the effectiveness of our framework. These results surpass established high-precision models such as ReDet and RTMDet while utilizing significantly fewer resources. Specifically, compared to RTMDet, our framework reduces the total parameter count by approximately 45%, which is a substantial leap in structural efficiency.

The superior accuracy on ship targets is directly attributed to the coupling of phylogenetic stability and functional plasticity. Ships in remote sensing imagery are characterized by extreme aspect ratios and precise geometric symmetries that are difficult to capture with generic lightweight filters. The Compact Anatomical Backbone solves this by enforcing constrained manifold flow that preserves the structural integrity of elongated targets during the feature extraction process. By selecting stable anatomical anchors optimized for oriented structures, the CAB provides a consistent geometric foundation for ship detection.

Complementing this structural foundation, the Functional Connectome Router enables dynamic somatic selection to handle the diverse orientations and scales of different vessels. In scenarios with complex harbor backgrounds or overlapping ship clusters, the FCR evaluates the utility of specialized neural experts in real time. It recruits a comprehensive coalition of assemblies to resolve fine-grained features only when the local information density requires it. This dynamic adaptation ensures that the model maintains high representational capacity for slender targets without incurring the metabolic cost of a dense heavyweight architecture. These findings demonstrate that Darwinian Wiring provides a robust solution for oriented detection on resource-constrained maritime monitoring platforms.

Table 7 summarizes the comparative results on the DIOR-R dataset which is characterized by high category diversity and complex background variations. CONERSLite achieves a state-of-the-art mAP of 69.68% which outperforms the strong PKINet-S baseline by a significant margin of 2.65%. This performance advantage is achieved while maintaining the lowest parameter count of 28.3 M among all competing models. It is noteworthy that CONERSLite FLOPs (195 G) is slightly higher than LSKNet S (161 G), but its parameter count is lower (28.3 M vs. 31.0 M). This phenomenon originates from the dynamic routing mechanism introduced by the FCR module: the routing network itself adds a secondary computational overhead, yet through sparse activation, the actual parameters involved in the active computation path are fewer than in static networks. Therefore, the moderate increase in FLOPs is an efficient trade-off for higher precision (69.68% vs. 65.90%), demonstrating the superior efficiency of dynamic resource allocation. The superior generalization of our framework across twenty different remote sensing categories demonstrates the effectiveness of the Darwinian Wiring mechanism.

In scenes with high intra class variability, the Compact Anatomical Backbone provides a stable structural manifold that captures the essential geometric properties of varied objects from airports to bridges. Unlike standard backbones that might overfit to specific textures, the CAB uses connectome constrained flow to prioritize stable anatomical features. Simultaneously, the Functional Connectome Router addresses the high entropy backgrounds prevalent in DIOR-R. In areas with significant background noise or distractors that resemble targets, the FCR dynamically evaluates the perceptual utility of neural experts. By suppressing irrelevant pathways and recruiting specialized assemblies only when target signals are detected, the system effectively increases its signal-to-noise ratio. This dynamic resource allocation is particularly effective for the diverse object scales found in DIOR-R. These findings confirm that the coupling of phylogenetic structural constraints and ontogenetic functional plasticity allows the model to maintain high representational capacity across diverse environmental conditions with minimal computational overhead.

4.4. Ablation Study

Core Component (CAB and FCR) Ablation Analysis: As demonstrated in Table 8 and Figure 6, the integration of the Compact Anatomical Backbone (CAB) and the Functional Connectome Router (FCR) is vital for the performance of CONERSLite. The full model achieves a mAP of 79.57% which represents a substantial improvement over all baseline configurations. When the FCR is removed, the mAP drops to 75.12% despite the presence of the CAB. This reduction occurs because the system loses its capacity for dynamic somatic selection and cannot reconfigure its functional connectome to match the input complexity. Specifically, in scenes with high informational density, a static CAB cannot recruit the necessary neural assemblies to enhance detection precision.

Similarly, replacing the CAB with a generic MobileNetV3 backbone while retaining the FCR leads to a decrease in mAP to 74.86%. This decline proves that the FCR requires a structured manifold with stable anatomical anchors to function effectively. A standard lightweight backbone lacks the geometric alignment properties needed for oriented remote sensing targets which prevents the FCR from selecting the optimal neural pathways. The worst performance of 71.39% mAP is observed when both modules are removed which reflects the limitations of traditional static lightweight architectures in handling the complexities of remote sensing imagery. These results confirm that the synergistic coupling of anatomical stability and functional plasticity is the primary source of the efficiency of the framework.

Multi-Scale Training Ablation Analysis: As demonstrated in Table 9, the full CONERSLite model achieves a peak overall mAP of 82.35 and a small target AP of 75.32. When the FCR is removed, the overall mAP drop is accompanied by a severe 6.87% reduction in small target AP. This failure is directly linked to the absence of the fitness-driven competition mechanism. In our FCR implementation, the Boltzmann selection gate uses a temperature parameter to modulate the activation of parallel neural assemblies. For clustered small targets that exhibit high local entropy, the Multi-Layer Perceptron (MLP)-based fitness evaluator assigns higher utility to high-frequency experts. Without this competitive selection, the system remains in a static state and cannot concentrate its functional connectome on the fine-grained details necessary for small objects, which leads to significant perceptual loss in dense scenes.

Similarly, replacing the CAB with a generic backbone while keeping the FCR leads to a 7.41% drop in small target AP. This indicates that the FCR requires the specific structural constraints of the CAB to operate effectively under multi-scale training. The CAB utilizes a low-rank bottleneck projection defined by down-projection and up-projection weight matrices. This structural bottleneck forces the feature flow to pass through a narrow manifold which acts as a phylogenetic filter. This filtering process preserves stable anatomical anchors that represent the geometric core of oriented targets. A generic backbone lacks this rank constrained flow and provides a feature space where small target signals are easily obscured by background clutter during multi-scale scaling operations. These results prove that the synergistic coupling of manifold projection in the CAB and competitive selection in the FCR is the primary driver of high-precision detection for small targets.

FCR Design Detail Ablation Analysis: As shown in Table 10, the full FCR achieves the best performance at 79.57% mAP, confirming its architectural effectiveness. Replacing GAP with LAP reduces mAP to 77.13%, showing global average pooling extracts context more effectively. Using a single neural assembly decreases mAP to 75.68%, proving multiple assemblies enhance representation. Standard Softmax lowers mAP to 78.05%, indicating temperature control optimizes sparse activation.

Neural Assembly Count ( $N_{a s m}$ ) Ablation Analysis: As illustrated in Table 11, the richness of the functional connectome, represented by the count of parallel neural assemblies

N_{a s m}

, plays a pivotal role in the adaptation capacity of the model. When

N_{a s m}

is set to low values such as 2 or 4, the mAP remains significantly lower at 76.23 or 78.45% respectively. This performance gap stems from the limited phenotypic plasticity of the expert pool. A restricted number of assemblies cannot cover the diverse geometric and textural niches present in remote sensing scenes where targets exhibit extreme scale variations and arbitrary orientations. In biological terms, a depauperate population of experts lacks the necessary functional diversity to respond effectively to high-complexity environmental stimuli.

With

N_{a s m}

is increased to 6, CONERSLite achieves an optimal peak mAP of 79.57 while maintaining a high inference speed of 103 FPS. This configuration represents a critical equilibrium point where the plurality of neural experts is sufficient to map the manifold of oriented targets without introducing excessive metabolic overhead. At this scale, the replicator dynamics within the FCR can effectively differentiate between specialists for varied categories while the selection pressure ensures that only the most fit coalition is activated for a given instance.

However, further increasing the count to 8 or 10 results in diminishing returns, where mAP saturates at 79.63% and even slightly declines to 79.51%. This suggests that the informational entropy of the scene is already fully captured by six specialized assemblies. The additional pathways introduce redundant functional overlaps, where multiple assemblies compete for the same representational niche. When the count of assemblies

N_{a s m}

increases from 6 to 8, FLOPs rise from 195 G to 210 G (adding 7.7%), while FPS drops from 103 to 92 (decreasing 10.7%). This discrepancy arises from the engineering implementation characteristics of dynamic routing. The increase in the number of assemblies not only brings a linear growth in multiplicative operations but also introduces additional memory access overhead and control flow complexity such as reduced parallelism in gating networks and increased cache miss rates. These factors cause the actual latency increase to be higher than the theoretical FLOPs increase. This phenomenon is widely observed in dynamic networks. From an accuracy perspective,

N_{a s m} = 8

only provides a marginal improvement of 0.06 mAP compared to

N_{a s m} = 6

, which is within the range of experimental noise. However, the associated latency cost of decreasing 10.7% FPS significantly impacts real-time applications. Therefore, considering mAP, speed, and computational efficiency, we select

N_{a s m} = 6

as the default configuration. For scenarios that are not sensitive to latency,

N_{a s m} = 8

can provide marginal accuracy gains.

Statistical Significance. To rigorously verify the robustness of the selected configuration, we conducted three independent training runs with different random seeds for the

N_{a s m} = 6

and

N_{a s m} = 8

settings. The results show that the mAP for

N_{a s m} = 6

is

79.57 \pm 0.15

and for

N_{a s m} = 8

is

79.63 \pm 0.18

. The difference of 0.06 mAP falls well within the overlapping confidence intervals of the two configurations, confirming that

N_{a s m} = 8

does not provide a statistically significant improvement over

N_{a s m} = 6

(

p > 0.05

under paired t-test). This analysis rules out the possibility that the observed saturation is a result of random training variance and substantiates our selection of

N_{a s m} = 6

as the default configuration based on the efficiency–accuracy trade-off rather than marginal accuracy gains.

Temperature Parameter ( $τ$ ) Ablation Analysis: As demonstrated in Table 12 and Figure 7, the temperature parameter

τ

acts as a critical physical lever for modulating the selection pressure within the Boltzmann selection gate of the FCR. This parameter dictates the sharpness of the probability distribution over parallel neural assemblies and directly influences the mean activation sparsity

S (G)

and the collaborative dynamics of the functional connectome.

At a low temperature of

τ = 0.1

, the selection mechanism enters a winner-take-all state characterized by extreme sparsity, where

S (G)

drops to 1.2. While this minimizes computational flow, the high selection pressure prevents the formation of synergistic coalitions. Under such conditions, the system is forced to rely on a single dominant assembly regardless of input complexity which leads to a significant degradation in mAP to 75.89% because multi-scale features cannot be integrated effectively.

Conversely, increasing

τ

to 1.0 results in a high entropy state with a mean activation count of 4.3. The loss of selection pressure leads to redundant activations where specialized signals are obscured by a uniform average of neural noise. This functional degradation occurs because the assemblies lose their competitive edge and transition from specialized experts back toward a generic uniform state, which reduces mAP to 78.65%.

Structural Compression Factor ( $ρ$ ) Ablation Analysis: The structural compression factor

ρ

serves as the primary regulator of the phylogenetic filtering process within the CAB modules. As illustrated in Table 13 and Figure 8,

ρ

determines the dimensionality of the low-rank manifold that anchors the feature representation of oriented objects. When

ρ

is reduced to 0.5 which corresponds to a 50% pruning rate, the mAP suffers a catastrophic decline to 74.36%. This degradation occurs because the extreme structural bottleneck destroys the geometric integrity of the anatomical anchors. In this over-compressed state, the manifold dimension is insufficient to preserve the rotational symmetries and elongated axes critical for remote sensing targets, which leads to severe information collapse during the projection into the compressed latent space.

In contrast, the optimal setting of

ρ = 0.7

with a 30% pruning rate achieves a peak mAP of 79.57 while maintaining high parameter efficiency. This specific ratio ensures that the structural core of the backbone is robust enough to act as a stable phylogenetic foundation for the subsequent functional routing. At this equilibrium, the low-rank bottleneck effectively filters out redundant spatial noise while retaining a complete set of geometric primitives necessary for resolving complex object orientations. Further increasing

ρ

to 0.9 yields only marginal improvements in mAP while significantly inflating the parameter count to 39.2 M and the computational cost to 281 G FLOPs. This plateau indicates that the essential structural manifold for remote sensing categories is relatively low rank and that additional parameters only contribute to representational redundancy without enhancing the perceptual gain. These results validate that the structural selection in CAB provides a lean yet powerful foundation for the Darwinian Wiring framework.

The sharp mAP transition from 74.36% (

ρ = 0.5

) to 79.57% (

ρ = 0.7

) reveals a critical threshold effect tied to the intrinsic dimensionality of the oriented object feature manifold. When

ρ

falls below 0.6, the effective rank of the CAB weight matrices becomes insufficient to preserve the geometric properties (elongated axes and angular symmetries) essential for remote sensing targets, causing abrupt discriminability loss. Singular value spectrum analysis confirms this: at

ρ = 0.7

, the retained singular values capture over 95% of the total spectral energy, whereas at

ρ = 0.5

this ratio drops to approximately 78%. The subsequent plateau beyond

ρ = 0.7

indicates that the intrinsic feature dimensionality of typical remote sensing categories is relatively low, and additional capacity primarily captures redundant noise rather than discriminative signals.

CAB Internal Sub-Component Ablation Analysis: To further investigate the contribution of each structural element within the Compact Anatomical Backbone, we conduct an ablation on the internal sub-components of CAB, as reported in Table 14. The full CAB employs a low-rank bottleneck projection with residual learning, depthwise spatial filtering, SiLU activation, and batch normalization. Replacing the bottleneck structure with standard

3 \times 3

convolutions inflates the parameter count to 35.6 M and the computational cost to 252 G FLOPs, while mAP decreases to 78.91%. This confirms that the rank constraint enhances feature discriminability by forcing information flow through a low-dimensional manifold. Removing the residual connection causes a drop to 77.23% mAP, demonstrating that the skip pathway is necessary for preserving gradient flow and enabling stable feature propagation. Expanding the bottleneck ratio to 1.0 (removing low-rank projection) increases parameters to 33.1 M with loss to 79.14%, indicating that the feature manifold is intrinsically low rank. Replacing depthwise separable convolutions with standard convolutions yields a mAP (79.71%) at the cost of higher parameters (35.2 M) and FLOPs (267 G), confirming that depthwise filtering provides an efficient spatial encoding.

We further ablate two fundamental operators within the CAB block that directly correspond to the nonlinear activation function

σ (\cdot)

in Equation (15) and the normalization layer in the bottleneck path. Replacing SiLU with ReLU causes a 0.83% mAP decrease to 78.74%, despite identical parameter count and FLOPs. This performance gap is attributed to the smooth, self-gated property of SiLU (

x \cdot sigmoid (x)

), which preserves gradient continuity on the low-rank manifold during the constrained ODE integration. The ReLU hard zero-threshold introduces gradient discontinuities that disrupt the assumed flow field in Equation (13), particularly in the compressed bottleneck space where feature magnitudes are inherently small. Replacing SiLU with GELU yields a 0.25% mAP decrease to 79.32%, suggesting that smooth activations in general are beneficial, but the SiLU explicit multiplicative gating provides a slight edge over the GELU probabilistic gating for oriented feature representations.

Removing BatchNorm from the bottleneck path causes the largest single-component degradation (2.91% mAP drop to 76.66%), surpassing the impact of removing the residual connection. This finding reveals that normalization plays a role in stabilizing the numerical integration of the constrained ODE: without BatchNorm, the feature magnitudes in the compressed r-dimensional space exhibit high variance across spatial locations, leading to gradient explosion during backpropagation and convergence instability. Replacing BatchNorm with GroupNorm (

G = 32

) recovers most of the performance (79.21% mAP) but introduces a slight slowdown (97 FPS vs. 103 FPS) due to the per-group statistics computation overhead. These results confirm that normalization within the low-rank bottleneck is a structural necessity for maintaining the stability of the manifold flow.

4.5. Visualization and Qualitative Analysis

From the provided visual comparison results in Figure 9, it can be clearly observed that the proposed method achieves significant improvements over the baseline approach. Across distinct representative scenarios, the heatmaps generated by the baseline method generally suffer from dispersed responses and insufficient focus on target regions, leading to blurred boundaries and background interference. In contrast, our method produces heatmaps with highly concentrated activations, which more precisely highlight the core areas of target objects.

The qualitative comparisons in Figure 10 provide granular evidence of the effectiveness of the proposed Darwinian Wiring framework. In these magnified local views, the baseline model frequently exhibits two primary failure modes: “Miss Detection”, where small or partially occluded targets are overlooked in dense clusters, and “Wrong Detection”, where background clutter or shadows are misidentified as targets. These issues are systematically addressed in CONERSLite through our dual-stage refinement process.

Specifically, in the dense harbor scenarios shown in the top row, the Functional Connectome Router (FCR) dynamically orchestrates a sparse neural coalition that effectively suppresses interfering dock textures and water reflections. This instance-specific routing ensures that the model’s representational capacity is focused solely on the targets, thereby preventing the omissions observed in the baseline. Simultaneously, in high-density parking lot and warehouse scenes in the middle and bottom rows, the Compact Anatomical Backbone (CAB) utilizes connectome-constrained features selected through structural plasticity to prioritize geometric consistency over deceptive surface textures. By preserving only the most stable anatomical anchors during the pruning phase, the CAB facilitates precise boundary localization even for overlapping objects. This transition from structural selection in the CAB to functional focus in the FCR allows CONERSLite to achieve superior precision and reliability, successfully eliminating the classification errors and omissions that plague the uncompressed baseline.

This enhancement not only increases the visual salience of targets but also indicates that the model extracts target features more accurately, yielding more reliable and interpretable detection results. The orientations of detected bounding boxes are also more stable and consistent with the target geometry, demonstrating the effectiveness of the proposed Darwinian Wiring and Functional Connectome Routing (FCR) in managing complex spatial distributions.

Advantages of Module Complementarity on Complex Categories: The synergistic integration of CAB and FCR demonstrates pronounced advantages in detecting objects with complex structures or large scale variations. At the category level, the proposed model demonstrates significant improvements across challenging scenarios, underscoring CONERSLite’s effectiveness in small object modeling, rotational invariance, and background suppression. The angular equivariance modeling with the anatomical backbone and the functional routing strategy complement each other, with the former addressing angular consistency and the latter improving the robustness of small targets in complex scenarios.

It should be noted that the nonlinear relationship between FPS and FLOPs reflects the complexity of dynamic networks in actual deployment. Theoretical calculation optimization does not always translate linearly to inference speed gains. This leaves room for future engineering optimization such as operator fusion and sparse library adaptation.

5. Discussion

The experimental results presented in the preceding sections demonstrate that the Darwinian Wiring paradigm offers a principled and effective approach to lightweight oriented object detection in remote sensing. This section synthesizes the key findings across all benchmarks and ablation studies, discusses the underlying design trade-offs, and identifies avenues for future improvement.

CONERSLite consistently achieves competitive or superior accuracy across three benchmarks with markedly different characteristics. On DOTA-v1.0 (Table 4), the proposed model surpasses representative lightweight backbones such as PKINet-S and LSKNet-S under both single-scale and multi-scale settings while maintaining a compact parameter budget of 28.3 M. The per-category radar chart in Figure 5 further reveals that these improvements are most pronounced in geometrically complex categories like Harbor, Swimming Pool, and Roundabout, where the dynamic routing of the FCR can selectively recruit specialized neural assemblies for intricate spatial structures. On HRSC2016 (Table 6), CONERSLite delivers a state-of-the-art mAP of 98.62% under the VOC 2012 metric and simultaneously reduces the total parameter count by approximately 45% relative to RTMDet. This result indicates that the constrained manifold flow within the CAB preserves the elongated geometric features critical for ship detection with high fidelity. The DIOR-R benchmark (Table 7) provides a particularly rigorous test of generalization, owing to its 20 diverse categories. CONERSLite outperforms the next-best model PKINet-S by 2.65% mAP on this dataset, confirming that competitive neural assembly selection generalizes well beyond domain-specific target types.

A comparison with the base detector MTP in Table 5 highlights the compression efficiency of the proposed framework. CONERSLite matches the mAP@0.5 of MTP while reducing its parameters by over 74% and its FLOPs by over 64%. The direct comparison with ultra-lightweight detectors such as YOLOv8n-obb and RTMDet-R-tiny in the same table shows that CONERSLite achieves substantially higher accuracy at the cost of a larger model footprint. This observation suggests that the Darwinian Wiring paradigm occupies a distinct and favorable position on the accuracy–efficiency landscape, targeting applications where detection reliability takes priority over absolute model minimality.

The ablation studies provide strong evidence for the synergistic nature of the CAB and FCR modules. Table 8 shows that removing either module individually leads to significant performance degradation, and the combined gain of the full model exceeds the sum of the individual module contributions. This super-additive behavior confirms that anatomical stability and functional plasticity are complementary rather than independent. The multi-scale ablation in Table 9 further reveals that both modules are especially critical for small-target detection, where the removal of either component causes a disproportionately large drop in small-target AP compared to overall mAP. These findings align with the biological intuition that stable structural wiring and adaptive functional routing must work in concert to handle complex perceptual demands.

The FCR design ablation in Table 10 demonstrates that global context aggregation, multiple parallel assemblies, and temperature-controlled selection each make indispensable contributions to the routing quality. Replacing any one of these elements with a simpler alternative results in measurable accuracy loss. Similarly, the CAB sub-component ablation in Table 14 identifies batch normalization and the residual connection as the two most impactful internal elements. The removal of batch normalization causes the single largest performance drop among all sub-components, underscoring its role in stabilizing feature propagation through the low-rank bottleneck.

The hyperparameter studies further characterize the operating envelope of CONERSLite. The neural assembly count

N_{a s m}

in Table 11 exhibits a clear saturation point at six assemblies, beyond which additional experts provide negligible accuracy gains while increasing computational overhead. The temperature parameter

τ

in Table 12 controls the trade-off between expert specialization and functional diversity, and the optimal value of 0.5 balances these competing objectives. The structural compression factor

ρ

in Table 13 reveals a pronounced threshold effect, where reducing the bottleneck ratio below 0.6 leads to a sharp accuracy collapse due to the destruction of essential geometric primitives on the feature manifold. These results collectively demonstrate that the framework is robust within a well-defined operating range and that its key hyperparameters have interpretable physical meanings grounded in the biological analogy.

Several limitations of the current framework deserve attention. The per-category analysis in Table 4 shows that CONERSLite underperforms on categories with extreme aspect ratios such as Bridge and on very small targets such as Helicopter. These weaknesses arise because the low-rank bottleneck in the CAB compresses feature representations below the discriminability threshold for such challenging geometries. In addition, the absolute parameter count and FLOPs of CONERSLite remain higher than those of ultra-lightweight models such as YOLOv8n-obb (Table 5), which may limit its applicability on the most resource-constrained edge devices. The nonlinear relationship between FLOPs and actual inference speed, as observed in the

N_{a s m}

ablation (Table 11), also highlights the engineering challenges inherent in deploying dynamic routing architectures on current hardware. Future work could explore adaptive rank selection to dynamically adjust the compression ratio based on input complexity, multi-resolution attention mechanisms to improve small-target recall, and knowledge distillation techniques to transfer the learned routing strategies into more hardware-friendly static architectures.

6. Conclusions

In this paper, we introduced Darwinian Wiring, a novel connectome-constrained structural plasticity framework for extreme model compression in remote sensing object detection. Inspired by the principles of neural Darwinism, our approach transcends the limitations of static pruning by implementing a dynamic, selection-based computation paradigm. Through the synergistic integration of the Compact Anatomical Backbone (CAB) and the Functional Connectome Router (FCR), our model, CONERSLite, achieves a state-of-the-art balance between perception capability and computational efficiency. Extensive experiments on benchmarks like DOTA, HRSC2016, and DIOR-R demonstrate that CONERSLite not only significantly reduces parameter count and FLOPs but also maintains or even surpasses the accuracy of much larger models. By enabling instance-specific neural coalitions, Darwinian Wiring provides a robust and energy-efficient solution for deploying high-performance detectors on resource-constrained edge platforms, establishing a new direction for biologically inspired neural architecture design.

Despite the high precision achieved by CONERSLite (e.g., 98.62% on HRSC2016), we acknowledge that performance on these datasets is nearing saturation. Remaining errors primarily stem from extreme occlusions, interference from cloud cover, and misses on ultra small targets. As detailed in Section 4.5, these failure modes reveal the inherent trade-offs of the rank-constrained manifold design: while the low-rank bottleneck achieves remarkable metabolic efficiency, it inevitably limits the preservation of fine-grained spatial details for extremely small or heavily occluded targets. Future work can be targeted at optimizing these long tail difficult cases.

Author Contributions

Conceptualization, L.T., Y.G. and S.Z.; methodology, L.T., W.G. and Y.G.; software, L.T., W.G. and J.L.; validation, J.L., Y.X. and Y.H.; formal analysis, Y.H. and Y.W.; investigation, Y.W. and W.M.; resources, S.Z. and Y.G.; data curation, J.L. and Y.H.; writing—original draft preparation, L.T.; writing—review and editing, L.T., Y.X., S.Z. and Y.G.; visualization, L.T. and Y.X.; supervision, S.Z. and Y.G.; project administration, S.Z. and Y.G.; funding acquisition, S.Z. and Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Project of the Xiangjiang Laboratory (No. 23Xj02003), the Training Program for Excellent Young Innovators of Changsha (Grant No. kq2209001), Hunan Excellent Young Scientists Fund (Grant No. 2025JJ40066) and General Program of National Natural Science Foundation of China (Grant No. 72571281).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets DIOR-R, HRSC2016, and DOTA-v1.0 arepublicly available from their respective sources. No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BSG	Boltzmann Selection Gate
CAB	Compact Anatomical Backbone
FCR	Functional Connectome Router
FLOPs	Floating Point Operations
FPS	Frames Per Second
mAP	mean Average Precision
MLP	Multi-Layer Perceptron
MoE	Mixture of Experts
MoS	Mean Activation Sparsity ( $S (G)$ )
SFP	Sparse Forward Propagation
SNR	Signal-to-Noise Ratio

References

Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic Neural Networks: A Survey. arXiv 2021, arXiv:2102.04906. [Google Scholar] [CrossRef]
Demir, E.; Akbas, E. Early-exit Convolutional Neural Networks. arXiv 2024, arXiv:2409.05336. [Google Scholar] [CrossRef]
Bengio, E.; Bacon, P.L.; Pineau, J.; Precup, D. Conditional Computation in Neural Networks for faster models. arXiv 2016, arXiv:1511.06297. [Google Scholar] [CrossRef]
Gao, X.; Zhao, Y.; Dudziak, Ł.; Mullins, R.; Xu, C.Z. Dynamic Channel Pruning: Feature Boosting and Suppression. arXiv 2019, arXiv:1810.05331. [Google Scholar] [CrossRef]
Peng, W.; Yi, J.; Wang, X.v.; Yi, Z.; Song, Y. LO-Det: Lightweight oriented object detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Wu, Z.; Zhen, H.; Zhang, X.; Bai, X.; Li, X. SEMA-YOLO: Lightweight small object detection in remote sensing image via shallow-layer enhancement and multi-scale adaptation. Remote Sens. 2025, 17, 1917. [Google Scholar] [CrossRef]
Zhan, J.; Bai, L.; Zhang, J.; Liu, T.; Shi, F.; Liu, Y.; Liu, L. Heterogeneous binary pixel difference networks for remote sensing object detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5602619. [Google Scholar] [CrossRef]
Zhong, X.; Zhan, J.; Xie, Y.; Zhang, L.; Zhou, G.; Liang, M.; Yang, K.; Guo, Z.; Li, L. Adaptive deformation-learning and multiscale-integrated network for remote sensing object detection. Remote Sens. 2025, 63, 5611619. [Google Scholar] [CrossRef]
Yuan, X.; Zheng, Z.; Li, Y.; Liu, X.; Liu, L.; Li, X.; Hou, Q.; Cheng, M.M. Strip R-CNN: Large strip convolution for remote sensing object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2026; Volume 40, pp. 12259–12267. [Google Scholar]
Li, Y.; Li, X.; Li, Y.; Zhang, Y.; Dai, Y.; Hou, Q.; Cheng, M.M.; Yang, J. Sm3det: A unified model for multi-modal remote sensing object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2026; Volume 40, pp. 6717–6725. [Google Scholar]
Zhu, X.; Liang, D.; Jiang, X.; Guan, Y.; Liu, Y.; Zhu, Y.; Bai, X. Layerlink: Bridging remote sensing object detection and large vision models with efficient fine-tuning. Pattern Recognit. 2025, 165, 111583. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
Yue, M.; Zhang, L.; Zhang, Y.; Zhang, H. An improved YOLOv8 detector for multi-scale target detection in remote sensing images. IEEE Access 2024, 12, 114123–114136. [Google Scholar] [CrossRef]
Jiang, T.; Lu, P.; Zhang, L.; Ma, N.; Han, R.; Lyu, C.; Li, Y.; Chen, K. Rtmpose: Real-time multi-person pose estimation based on mmpose. arXiv 2023, arXiv:2303.07399. [Google Scholar]
Huang, S.; Lin, C.; Jiang, X.; Qu, Z. BRSTD: Bio-Inspired Remote Sensing Tiny Object Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5643115. [Google Scholar] [CrossRef]
Qiang, H.; Hao, W.; Xie, M.; Tang, Q.; Shi, H.; Zhao, Y.; Han, X. SCM-YOLO for lightweight small object detection in remote sensing images. Remote Sens. 2025, 17, 249. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Z.; Xiong, Z.; Zhang, Y.; Xu, X. SCM-YOLO: An efficient and lightweight detector for small object detection in remote sensing imagery. Remote Sens. 2025, 17, 25. [Google Scholar]
Lyu, X.; Tian, L.; Teng, S. ECDet: Efficient oriented object detection on the aerial image with cross-layer attention. J. Real-Time Image Process. 2025, 22, 42. [Google Scholar] [CrossRef]
Liu, J.; Zhan, J.; Guo, Y.; Li, T.; Zhao, Y.; Zhang, J.; Tang, L.; Li, Y.; Wei, Y.; Cai, W. Learning geometric-aware and weather-adaptive semantics in remote sensing: Affine Lie group enhanced detector for UAV road scenes. IEEE Trans. Geosci. Remote Sens. 2026, 64, 5605820. [Google Scholar] [CrossRef]
Liu, J.; Zhan, J.; Zhang, J.; Chen, J.; Song, Y.; Tang, L.; Zhou, L.; Du, C.; Wei, Y.; Guo, Y. Robust scale fusion and edge-aware feature attention network for remote sensing UAV road detection under harsh weather. Results Eng. 2025, 27, 104337. [Google Scholar] [CrossRef]
Dai, X.; Chen, Y.; Yang, J.; Zhang, P.; Yuan, L.; Zhang, L. Dynamic detr: End-to-end object detection with dynamic attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 2988–2997. [Google Scholar]
Gao, M.; Yu, R.; Li, A.; Morariu, V.I.; Davis, L.S. Dynamic zoom-in network for fast object detection in large images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6926–6935. [Google Scholar]
Wang, N.; Gao, Y.; Chen, H.; Wang, P.; Tian, Z.; Shen, C.; Zhang, Y. NAS-FCOS: Fast neural architecture search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11943–11951. [Google Scholar]
Yao, L.; Xu, H.; Zhang, W.; Liang, X.; Li, Z. SM-NAS: Structural-to-modular neural architecture search for object detection. In Proceedings of the AAAI conference on artificial intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12661–12668. [Google Scholar]
Ebrahimpour, M.K.; Li, J.; Yu, Y.Y.; Reesee, J.; Moghtaderi, A.; Yang, M.H.; Noelle, D.C. Ventral-dorsal neural networks: Object detection via selective attention. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV); IEEE: Piscataway, NJ, USA, 2019; pp. 986–994. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06); IEEE: Piscataway, NJ, USA, 2006; Volume 3, pp. 850–855. [Google Scholar]
Kim, S.; Park, S.; Na, B.; Yoon, S. Spiking-YOLO: Spiking neural network for energy-efficient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11270–11277. [Google Scholar]
Su, Q.; Chou, Y.; Hu, Y.; Li, J.; Mei, S.; Zhang, Z.; Li, G. Deep directly-trained spiking neural networks for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023. [Google Scholar]
Forster, D.T.; Li, S.C.; Yashiroda, Y.; Yoshimura, M.; Li, Z.; Isuhuaylas, L.A.V.; Itto-Nakama, K.; Yamanaka, D.; Ohya, Y.; Osada, H.; et al. BIONIC: Biological network integration using convolutions. Nat. Methods 2022, 19, 1250–1261. [Google Scholar] [CrossRef]
Roberts, N.; Yap, D.A.; Prabhu, V.U. Deep Connectomics Networks: Towards Biologically-Inspired Computer Vision. In Proceedings of the ICLR Workshop, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Kubilius, J.; Schrimpf, M.; Hong, H.; Majaj, N.J.; Rajalingham, R.; Issa, E.B.; Kar, K.; Bashivan, P.; Prescott-Roy, J.; Schmidt, K.; et al. Brain-like object recognition with recurrent supervised networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Liu, Y.; Wang, R.; Shan, S.; Chen, X. Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Zhan, J.; Xie, Y.; Guo, J.; Hu, Y.; Zhou, G.; Cai, W.; Wang, Y.; Chen, A.; Xie, L.; Li, M.; et al. DGPF-RENet: A low data dependence network with low training iterations for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5520521. [Google Scholar] [CrossRef]
Adeli, H.; Ahn, S.; Zelinsky, G.J. OCRA: A brain-inspired object-based attention network for multiobject recognition and visual reasoning. bioRxiv 2022, 2022.04.02.486850. [Google Scholar]
Li, T.; Zhan, J.; Liu, J.; Guo, J.; Xiong, X.; Cai, W.; Liu, E.; Wei, Y.; Hu, Y. Sorted texture-aware glance and gaze network for hyperspectral image classification with low training samples. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5518126. [Google Scholar] [CrossRef]
Zheng, Z.; Wei, J.; Xu, Y.; Li, C.; Lu, T.; Guo, Q.; Ji, X.; Guo, H.; Wang, G.; Deng, L. Modeling macroscopic brain dynamics with brain-inspired computing architecture. Nat. Commun. 2025, 16, 9424. [Google Scholar] [CrossRef]
Parhi, K.K.; Unnikrishnan, N.K. Brain-inspired computing: Models and architectures. IEEE Open J. Circuits Syst. 2020, 1, 185–204. [Google Scholar] [CrossRef]
Xia, J.; Wang, S. A Lightweight Brain-Inspired Machine Learning Framework for Coronary Angiography: Hybrid Neural Representation and Robust Learning Strategies. arXiv 2026, arXiv:2601.15865. [Google Scholar] [CrossRef]
Wang, D.; Zhang, J.; Xu, M.; Liu, L.; Wang, D.; Gao, E.; Han, C.; Guo, H.; Du, B.; Tao, D.; et al. MTP: Advancing remote sensing foundation model via multitask pretraining. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11632–11654. [Google Scholar] [CrossRef]
Hu, Z.; Gao, K.; Zhang, X.; Wang, J.; Wang, H.; Yang, Z.; Li, C.; Li, W. EMO2-DETR: Efficient-matching oriented object detection with transformers. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5616814. [Google Scholar] [CrossRef]
Wang, J.; Yang, W.; Li, H.C.; Zhang, H.; Xia, G.S. Learning center probability map for detecting objects in aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4307–4323. [Google Scholar] [CrossRef]
Dai, L.; Liu, H.; Tang, H.; Wu, Z.; Song, P. AO2-DETR: Arbitrary-oriented object detection transformer. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2342–2356. [Google Scholar] [CrossRef]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2849–2858. [Google Scholar]
Yujie, L.; Xiaorui, S.; Wenbin, S.; Yafu, Y. S2ANet: Combining local spectral and spatial point grouping for point cloud processing. Virtual Real. Intell. Hardw. 2024, 6, 267–279. [Google Scholar] [CrossRef]
Hou, L.; Lu, K.; Xue, J.; Li, Y. Shape-adaptive selection and measurement for oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 923–932. [Google Scholar]
Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
Han, J.; Ding, J.; Xue, N.; Xia, G.S. ReDet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 2786–2795. [Google Scholar]
Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q. Rethinking rotated object detection with Gaussian Wasserstein distance loss. In Proceedings of the International Conference on Machine Learning (ICML), PMLR, Virtual, 18–24 July 2021; pp. 11830–11841. [Google Scholar]
Xiao, Z.; Yang, G.; Yang, X.; Mu, T.; Yan, J.; Hu, S. Theoretically achieving continuous representation of oriented bounding boxes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16912–16922. [Google Scholar]
Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning high-precision bounding box for rotated object detection via Kullback–Leibler divergence. Adv. Neural Inf. Process. Syst. 2021, 34, 18381–18394. [Google Scholar]
Yang, X.; Zhou, Y.; Zhang, G.; Yang, J.; Wang, W.; Yan, J.; Zhang, X.; Tian, Q. The KFIoU loss for rotated object detection. arXiv 2022, arXiv:2201.12558. [Google Scholar]
Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.M.; Yang, J.; Li, X. Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 16794–16805. [Google Scholar]
Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly kernel inception network for remote sensing detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 27706–27716. [Google Scholar]
Wei, H.; Zhang, Y.; Chang, Z.; Li, H.; Wang, H.; Sun, X. Oriented objects as pairs of middle lines. ISPRS J. Photogramm. Remote Sens. 2020, 169, 268–279. [Google Scholar] [CrossRef]
Ming, Q.; Miao, L.; Zhou, Z.; Dong, Y. CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5605814. [Google Scholar] [CrossRef]
Liu, S.; Zhang, L.; Lu, H.; He, Y. Center-boundary dual attention for oriented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5603914. [Google Scholar] [CrossRef]
Zhang, C.; Lam, K.M.; Wang, Q. CoF-Net: A progressive coarse-to-fine framework for object detection in remote-sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5600617. [Google Scholar] [CrossRef]
Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 677–694. [Google Scholar]
Guo, Z.; Liu, C.; Zhang, X.; Jiao, J.; Ji, X.; Ye, Q. Beyond Bounding-Box: Convex-hull Feature Adaptation for Oriented and Densely Packed Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 8788–8797. [Google Scholar] [CrossRef]
Achinek, D.N.; Shehu, I.S.; Athuman, A.M.; Fu, X. DAF-Net: Dense attention feature pyramid network for multiscale object detection. Int. J. Multimed. Inf. Retr. 2024, 13, 18. [Google Scholar] [CrossRef]
Cheng, G.; Yao, Y.; Li, S.; Li, K.; Xie, X.; Wang, J.; Yao, X.; Han, J. Dual-aligned oriented detector. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5618111. [Google Scholar] [CrossRef]
Cheng, G.; Wang, J.; Li, K.; Xie, X.; Lang, C.; Yao, Y.; Han, J. Anchor-free oriented proposal generator for object detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5625411. [Google Scholar] [CrossRef]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar] [CrossRef]
Wang, D.; Zhang, Q.; Xu, Y.; Zhang, J.; Du, B.; Tao, D.; Zhang, L. Advancing plain vision transformer toward remote sensing foundation model. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5607315. [Google Scholar] [CrossRef]
Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef]
Ansari, M.Z.; Ahmad, F.; Taheri, E.N.; Reddy, R.G.J.; Mabood, F. Animal identity recognition using object detection techniques. Procedia Comput. Sci. 2024, 233, 651–659. [Google Scholar] [CrossRef]
Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C. Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 14–19 June 2020; pp. 11207–11216. [Google Scholar]
Ming, Q.; Zhou, Z.; Miao, L.; Zhang, H.; Li, L. Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 2355–2363. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
Ming, Q.; Miao, L.; Zhou, Z.; Song, J.; Dong, Y.; Yang, X. Task interleaving and orientation estimation for high-precision oriented object detection in aerial images. ISPRS J. Photogramm. Remote Sens. 2023, 196, 241–255. [Google Scholar] [CrossRef]
Zeng, Y.; Chen, Y.; Yang, X.; Li, Q.; Yan, J. ARS-DETR: Aspect ratio-sensitive detection transformer for aerial oriented object detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5610315. [Google Scholar] [CrossRef]
Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 1829–1838. [Google Scholar]
Xu, C.; Ding, J.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. Dynamic coarse-to-fine learning for oriented tiny object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7318–7328. [Google Scholar]

Figure 1. Accuracy–efficiency landscape for state-of-the-art oriented object detectors.

Figure 2. The architecture diagram of CONERSLite.

Figure 3. Detailed structural diagram of the Functional Connectome Router (FCR). The router evaluates the utility of multiple parallel neural assemblies via a gating network and dynamically activates a sparse coalition based on target complexity, effectively representing ontogenetic selection pressure.

Figure 4. Detailed structural diagram of the Compact Anatomical Backbone (CAB). The module implements a continuous metric projection through a low-rank manifold (via

W_{d o w n}

and

W_{u p}

), where depthwise spatial filtering is performed within the strictly constrained bottleneck to enforce phylogenetic constraints.

Figure 4. Detailed structural diagram of the Compact Anatomical Backbone (CAB). The module implements a continuous metric projection through a low-rank manifold (via

W_{d o w n}

and

W_{u p}

), where depthwise spatial filtering is performed within the strictly constrained bottleneck to enforce phylogenetic constraints.

Figure 5. Per-category performance comparison on the DOTA-v1.0 dataset. This radar chart visualizes the AP scores for each of the 15 categories under Single-Scale (left) and Multi-Scale (right) settings. The red line denotes our CONERSLite, which consistently demonstrates a performance advantage over other state-of-the-art methods. The shaded area highlights the robustness of our approach, particularly in challenging categories with dense distributions and high orientation variability.

Figure 6. Performance vs. parameter trade-off for different model variants. This scatter plot shows how CAB and FCR influence accuracy and complexity. While individual modules improve upon the traditional lightweight model, their synergistic integration in CONERSLite achieves a substantial mAP boost. This proves that combining structural plasticity with dynamic functional routing creates a more potent representation than either module alone.

Figure 7. Ablation study of temperature parameter (

τ

) in FCR. This chart shows the influence of temperature on assembly activation and performance. A low

τ

of 0.1 causes excessive sparsity and a winner-take-all state that ignores feature coalitions. Conversely, a high

τ

of 1.0 leads to redundant uniform activations that hinder specialization. The peak at

τ = 0.5

represents the optimal balance between competition and collaboration.

Figure 7. Ablation study of temperature parameter (

τ

) in FCR. This chart shows the influence of temperature on assembly activation and performance. A low

τ

of 0.1 causes excessive sparsity and a winner-take-all state that ignores feature coalitions. Conversely, a high

τ

of 1.0 leads to redundant uniform activations that hinder specialization. The peak at

τ = 0.5

represents the optimal balance between competition and collaboration.

Figure 8. Ablation study of structural compression factor (

ρ

) in CAB. This graph illustrates performance sensitivity to the CAB pruning ratio. Increasing

ρ

from 0.5 to 0.7 triggers sharp mAP gains which highlights the necessity of anatomical anchors for feature extraction. The subsequent plateau beyond

ρ = 0.7

suggests diminishing returns for additional parameters and confirms 0.7 as the optimal balance between efficiency and representational capacity.

Figure 8. Ablation study of structural compression factor (

ρ

) in CAB. This graph illustrates performance sensitivity to the CAB pruning ratio. Increasing

ρ

from 0.5 to 0.7 triggers sharp mAP gains which highlights the necessity of anatomical anchors for feature extraction. The subsequent plateau beyond

ρ = 0.7

suggests diminishing returns for additional parameters and confirms 0.7 as the optimal balance between efficiency and representational capacity.

Figure 9. Visual analysis of CONERSLite in typical remote sensing scenarios. Our method produces concentrated activations and stable orientations even in complex backgrounds.

Figure 10. Detailed local magnified comparison of detection results. The baseline results (left) frequently suffer from missed detections (Miss Detection) and classification errors (Wrong Detection) in high-density scenes. In contrast, CONERSLite (right) shows much higher reliability and precision, accurately capturing small-scale vehicles and complex harbor structures.

Table 1. Summary of related methods in lightweight and brain-inspired detection.

Category	Core Idea	Representative Models	Advantages	Limitations
Lightweight Design
Factorization	Factorizing standard convolutions into cheaper operators.	MobileNet; ShuffleNet; GhostNet	Significant reduction in parameters and FLOPs.	Static computational graph lacks input-specific adaptability.
Feature Alignment	Specialized geometric and semantic alignment for oriented objects.	LO-Det; SEMA-YOLO; MSF-SNET	High efficiency for remote sensing oriented targets.	Constant computational schedule regardless of scene density.
Adaptive Evolution	Introducing task-specific plasticity or bio-inspired modules.	BRSTD; SCM-YOLO; ECDet	Enhanced small object discriminability and awareness.	Hardware-fixed attention weights limit dynamic resource allocation.
Brain-Inspired
Event-Based Logic	Simulating energy-efficient communication through discrete spikes.	EMS-YOLO; Spiking YOLO; MSD	Extreme power efficiency via sparse pulse computation.	Lower precision and training instability compared to ANNs.
Selective Attention	Allocating cognitive resources to salient spatial regions.	OCRA; BRSTD	Focuses computation on informative object regions.	Primarily adjusts weights without structural reconfiguration.
Recursive Feedback	Mimicking iterative decision making through feedback loops.	CORnet-S; RNN-based Detectors	Accurate spatial refinement for complex boundaries.	High baseline latency due to multiple computational steps.
Connectomics-Inspired	Engineering structural connectivity patterns from biological brains.	BioNIC; DCN; SIN	Efficient balance between local and long-range processing.	Complexity in designing biologically realistic topologies.
Auto and Dynamic
Dynamic Routing	Adaptive computation paths via input dynamics.	Dynamic DETR; Zoom-In Nets	Filters background and saves computations.	Complex controllers.
Architectural Search	Automating macro/micro structural combinations.	NAS-FCOS; SM-NAS	Dominates manual architecture efficiency.	High search cost.
Macroscopic Dynamics	Mapping systemic brain dynamics mapping.	Zheng et al. [41]; Xia et al. [43]	Robustness under extreme constraints.	Hardware coupling.

Table 2. Mapping between biological connectomics principles and engineering implementations in CONERSLite.

Biological Principle	Module	Engineering Implementation
Anatomical Connectome (stable wiring)	CAB	Low-rank bottleneck with residual connections and DWConv
Axonal Volume Constraint	CAB ( $ρ$ )	Rank-constrained projection $P_{K}$ via $1 \times 1$ Conv with reduced channels
Functional Connectome (plastic wiring)	FCR	Parallel neural assemblies with dynamic routing
Neuronal Group Selection	FCR (BSG)	Boltzmann Selection Gate (temperature-controlled Softmax)
Somatic Selection/Firing Decision	FCR ( $g_{i}$ )	Gumbel-Softmax differentiable binary gate
Metabolic Cost of Synaptic Transmission	$C_{i}$	FLOPs-normalized cost + KL sparsity penalty
Fisher’s Fundamental Theorem	Training	Replicator dynamics ensuring non-decreasing mean fitness

Table 3. Pseudocode for the CONERSLite inference procedure.

CONERSLite Forward Pass
Input: Image $I \in R^{3 \times H \times W}$ , temperature $τ$
Output: Detection results ${b_{j}, c_{j}, θ_{j}}_{j = 1}^{N_{d e t}}$
// Stage 1: Compact Anatomical Backbone (CAB)
1: $F_{m u l t i} = {F_{1}, F_{2}, F_{3}, F_{4}} \leftarrow$ Multi-scale features
2: for each stage $l = 1, \dots, 4$ do
3: $h_{l o w} \leftarrow W_{d o w n}^{(l)} \cdot F_{l}$ // Down-project to rank $r = ⌊ ρ C ⌋$
4: $h_{s p a t i a l} \leftarrow DWConv (SiLU (BN (h_{l o w})))$ // Spatial filtering
5: $F_{l} \leftarrow F_{l} + W_{u p}^{(l)} \cdot h_{s p a t i a l}$ // Residual up-project
6: end for
// Stage 2: Functional Connectome Router (FCR)
7: $z \leftarrow GAP (F_{l})$ // Global context vector
8: $ϕ \leftarrow MLP (z) \in R^{N_{a s m}}$ // Fitness logits
9: $g^{*} \leftarrow Softmax (ϕ / τ)$ // Boltzmann selection weights
10: for each assembly $i = 1, \dots, N_{a s m}$ do
11: $y_{i} \leftarrow A_{i} (F_{l})$ // Expert feature extraction
12: end for
13: $F_{o u t} \leftarrow \sum_{i = 1}^{N_{a s m}} g_{i}^{*} \cdot y_{i}$ // Weighted aggregation
// Stage 3: Detection Head
14: ${b_{j}, c_{j}, θ_{j}} \leftarrow DetectionHead (F_{o u t})$

Table 4. Complete performance and efficiency comparison with existing SOTA models on DOTA-v1.0 dataset. Our method achieves superior performance especially in oriented and small objects.

Method	Pre.	mAP↑	#P↓	GFLOPs↓	PL	BD	BR	GTF	SV	LV	SH	TC	BC	ST	SBF	RA	HA	SP	HC
Single-Scale
EMO2-DETR [45]	IN	70.91	74.3 M	304 G	87.99	79.46	45.74	66.64	78.90	73.90	73.30	90.40	80.55	85.89	55.19	63.62	51.83	70.15	60.04
CenterMap [46]	IN	71.59	41.1 M	198 G	89.02	80.56	49.41	61.98	77.99	74.19	83.74	89.44	78.01	83.52	47.64	65.93	63.68	67.07	61.59
AO2-DETR [47]	IN	72.15	74.3 M	304 G	86.01	75.92	46.02	66.65	79.70	79.93	89.17	90.44	81.19	76.00	56.91	62.45	64.22	65.80	58.96
SCRDet [48]	IN	72.61	41.9 M	–	89.98	80.65	52.09	68.36	68.36	60.32	72.41	90.85	87.94	86.86	65.02	66.68	66.25	68.24	65.21
R3Det [49]	IN	73.70	41.9 M	336 G	89.50	81.20	50.50	66.10	70.90	78.70	78.20	90.80	85.30	84.20	61.80	63.80	68.20	69.80	67.20
RoI Trans. [50]	IN	74.05	55.1 M	200 G	89.01	77.48	51.64	72.07	74.43	77.55	87.76	90.81	79.71	85.27	58.36	64.11	76.50	71.99	54.06
S²ANet [51]	IN	74.12	38.5 M	–	89.11	82.84	48.37	71.11	78.11	78.39	87.25	90.83	84.90	85.64	60.36	62.60	65.26	69.13	57.94
SASM [52]	IN	74.92	36.6 M	–	86.42	78.97	52.47	69.84	77.30	75.99	86.72	90.89	82.63	85.66	60.13	68.25	73.98	72.22	62.37
G.V. [53]	IN	75.02	41.1 M	198 G	89.64	85.00	52.26	77.34	73.01	73.14	86.82	90.74	79.02	86.81	59.55	70.91	72.94	70.86	57.32
O-RCNN [54]	IN	75.87	41.1 M	199 G	89.46	82.12	54.78	70.86	78.93	83.00	88.20	90.90	87.50	84.68	63.97	67.69	74.64	84.93	52.28
ReDet [55]	IN	76.25	31.6 M	–	88.79	82.64	53.97	74.00	78.13	84.06	88.04	90.89	87.78	85.75	61.76	60.39	75.96	68.07	63.59
R3Det-GWD [56]	IN	76.34	41.9 M	336 G	88.82	82.94	55.63	72.75	78.52	83.10	87.46	90.21	86.36	85.44	64.70	61.41	73.46	76.94	57.38
COBB [57]	IN	76.52	41.9 M	–	–	–	–	–	–	–	–	–	–	–	–	–	–	–	–
R3Det-KLD [58]	IN	77.36	41.9 M	336 G	88.90	84.17	55.80	69.35	78.72	84.08	87.00	89.75	84.32	85.73	64.74	61.80	76.62	78.49	70.89
ARC [59]	IN	77.35	74.4 M	217 G	89.40	82.48	55.33	73.88	79.37	84.05	88.06	90.90	86.44	84.83	63.63	70.32	74.29	71.91	65.43
LSKNet-S [60]	IN	77.49	31.0 M	161 G	89.66	85.52	57.72	75.70	74.95	78.69	88.24	90.88	86.79	86.38	66.92	63.77	77.77	74.47	64.82
PKINet-S [61]	IN	78.39	30.8 M	190 G	89.72	84.20	55.81	77.63	80.25	84.45	88.12	90.88	87.57	86.07	66.86	70.23	77.47	73.62	62.94
O²DNet [62]	–	71.01	–	–	89.36	82.18	47.31	61.24	71.33	74.02	78.64	90.81	82.23	81.42	60.90	60.22	58.21	67.02	61.04
CFC-Net [63]	–	73.52	–	–	89.13	80.44	52.41	70.04	76.31	78.14	87.23	90.92	84.54	85.61	60.51	61.53	67.82	68.01	50.15
CBDA-Net [64]	–	75.74	–	–	89.24	85.94	50.32	65.01	77.71	82.33	87.93	90.52	86.51	85.90	66.92	66.51	67.43	71.32	62.91
CoF-Net [65]	–	77.21	–	–	89.6	83.12	48.31	73.64	78.23	83.04	86.72	90.24	82.32	86.61	67.61	64.63	74.70	71.32	78.42
Strip R-CNN-S [12]	IN	80.06	30 M	–	88.91	86.38	57.44	76.37	79.73	84.38	88.25	90.86	86.71	87.45	69.89	66.82	79.25	82.91	75.58
MTP (base detector) [44]	IN	79.03	110.8 M	543 G	89.87	85.09	58.27	71.70	81.70	87.10	88.98	91.44	85.41	86.45	57.44	68.47	78.42	82.97	71.71
CONERSLite (Ours)	IN	79.57	28.3 M	195 G	89.55	85.12	61.20	72.45	81.88	86.95	88.76	90.92	85.44	86.35	60.50	69.45	79.12	83.15	72.68
Multi-Scale
CSL [66]	IN	76.17	37.4 M	236 G	90.25	85.53	54.64	75.31	70.44	73.51	77.62	90.84	86.15	86.69	69.60	68.04	73.83	71.10	68.93
CFA [67]	IN	76.67	–	–	89.08	83.20	54.37	66.87	81.23	80.96	87.17	90.21	84.32	86.09	52.34	69.94	75.52	80.76	67.96
DAFNeT [68]	IN	76.95	–	–	89.40	86.27	53.70	60.51	82.04	81.17	88.66	90.37	83.81	87.27	53.93	69.38	75.61	81.26	70.86
DODet [69]	IN	80.62	–	–	89.96	85.52	58.01	81.22	78.71	85.46	88.59	90.89	87.12	87.80	70.50	71.54	82.06	77.43	74.47
AOPG [70]	IN	80.66	–	–	89.88	85.57	60.90	81.51	78.70	85.29	88.85	90.89	87.60	87.65	71.66	68.69	82.31	77.32	73.10
KFIoU [59]	IN	80.93	58.8 M	206 G	89.44	84.41	62.22	82.51	80.10	86.07	88.68	90.90	87.32	88.38	72.80	71.95	78.96	74.95	75.27
RTMDet-R [71]	CO	81.33	52.3 M	205 G	88.01	86.17	58.54	82.44	81.30	84.82	88.71	90.89	88.77	87.37	71.96	71.18	81.23	81.40	77.13
RVSA [72]	MA	81.24	114.4 M	414 G	88.97	85.76	61.46	81.27	79.98	85.31	88.30	90.84	85.06	87.50	66.77	73.11	84.75	81.88	77.58
LSKNet-S [60]	IN	81.64	31.0 M	161 G	89.57	86.34	63.13	83.67	82.20	86.10	88.66	90.89	88.41	87.42	71.72	69.58	78.88	81.77	76.52
FR-O [73]	–	54.11	–	–	79.42	77.13	17.72	64.14	35.32	38.01	37.26	89.43	69.62	59.34	50.30	52.93	47.91	47.43	46.31
CAD-Net [74]	–	69.96	–	–	87.82	82.41	49.43	73.51	71.13	63.52	76.62	90.91	79.23	73.33	48.44	60.92	62.03	67.01	62.24
APE [75]	–	75.81	–	–	90.01	83.65	53.43	76.02	74.01	77.27	79.53	90.82	87.27	84.51	67.72	60.31	74.60	71.84	65.62
SCRDet [48]	–	72.63	–	–	90.01	80.71	52.14	68.43	68.42	60.32	72.44	90.92	87.94	86.96	65.02	66.73	66.31	68.23	65.21
Strip R-CNN-S [12]	IN	82.28	30 M	–	89.17	85.57	62.40	83.71	81.93	86.58	88.84	90.86	87.97	87.91	72.07	71.88	79.25	82.45	82.82
CONERSLite (Ours)	IN	82.35	28.3 M	195 G	90.15	86.20	61.50	83.15	82.55	86.40	88.85	90.92	88.10	86.85	72.45	74.20	82.15	83.85	77.95

Table 5. Direct comparison with lightweight detectors and the base detector on DOTA-v1.0. EfficientDet-D0 is a horizontal detector with no native OBB support. † Results under multi-scale evaluation. All oriented models are evaluated under single-scale unless noted.

Method	#P	GFLOPs	mAP@0.5	mAP@0.5:0.95
EfficientDet-D0 [15]	3.9 M	2.5 G	N/A (HBB)	–
YOLOv8n-obb †	3.11 M	23.3 G	78.0	–
RTMDet-R-tiny [71]	4.88 M	20.45 G	75.36	–
RTMDet-R-tiny † [71]	4.88 M	20.45 G	79.82	–
MTP (base detector) [44]	110.8 M	543 G	79.03	47.42
CONERSLite (Ours)	28.3 M	195 G	79.57	46.71

Table 6. Comparison with SOTA methods on HRSC2016 datasets. CONERSLite achieves the best accuracy with the lowest parameters and FLOPs.

Method	Pre.	mAP (07)	mAP (12)	#P	GFLOPs
DRN [76]	IN	–	92.70	–	–
CenterMap [46]	IN	–	92.80	41.1 M	198 G
RoI Trans. [50]	IN	86.20	–	55.1 M	200 G
G.V. [53]	IN	88.20	–	41.1 M	198 G
R3Det [49]	IN	89.26	96.01	41.9 M	336 G
DAL [77]	IN	89.77	–	36.4 M	216 G
GWD [56]	IN	89.85	97.37	47.4 M	456 G
S²ANet [51]	IN	90.17	95.01	38.6 M	198 G
AOPG [70]	IN	90.34	96.22	–	–
ReDet [55]	IN	90.46	97.63	31.6 M	–
O-RCNN [54]	IN	90.50	97.60	41.1 M	199 G
RTMDet [71]	CO	90.60	97.10	52.3 M	205 G
MTP [44]	IN	–	–	110.8 M	543 G
CONERSLite (Ours)	IN	90.65	98.62	28.3 M	195 G

Table 7. Comparison with SOTA methods on DIOR-R datasets. CONERSLite outperforms all competitors.

Method	Pre.	#P	GFLOPs	mAP (%)
RetinaNet-O [78]	IN	–	–	57.55
Faster RCNN-O [79]	IN	41.1 M	198 G	59.54
TIOE-Det [80]	IN	41.1 M	198 G	61.98
O-RCNN [54]	IN	41.1 M	199 G	64.30
ARS-DETR [81]	IN	41.1 M	198 G	66.12
O-RepPoints [82]	IN	36.6 M	–	66.71
DCFL [83]	IN	–	–	66.80
LSKNet-S [60]	IN	31.0 M	161 G	65.90
PKINet-S [61]	IN	30.8 M	190 G	67.03
CONERSLite (Ours)	IN	28.3 M	195 G	69.68

Note: LSKNet S and PKINet S are general purpose backbones. The results reported on DIOR R represent their full detector performance when combined with mainstream heads (e.g., Oriented RCNN) to ensure end-to-end fairness with CONERSLite. Specific implementation details are available in their original papers and source code.

Table 8. Ablation study of core components (CAB and FCR) with inference speed. All variants are evaluated on a single RTX 4090 under identical settings. Removing key components leads to consistent performance drops, confirming their individual contributions.

Variant	mAP	#P	GFLOPs	FPS
CONERSLite (Full Model)	79.57	28.3	195.0	103
Only CAB (w/o FCR)	75.12	25.7	168.0	125
Only FCR (MobileNetV3 Backbone)	74.86	29.1	212.0	88
Static Variant (w/o CAB and FCR)	71.39	27.5	183.0	118

Table 9. Ablation study of core components under multi-scale training. Both CAB and FCR contribute significantly to performance improvements, especially for small target detection.

Variant	Overall mAP	Small Target AP
CONERSLite (Full Model)	82.35	75.32
w/o FCR (Only CAB)	77.89	68.45
w/o CAB (MobileNetV3 + FCR)	77.56	67.91

Table 10. Ablation study of key design details in FCR with inference speed. Replacing core design elements leads to performance degradation, confirming the effectiveness of our architectural choices.

Variant	mAP	#P	GFLOPs	FPS
Full FCR (Ours)	79.57	28.3	195.0	103
FCR w/ LAP instead of GAP	77.13	28.3	192.0	105
FCR w/ Single Neural Assembly	75.68	26.5	165.0	132
FCR w/ Standard Softmax ( $τ = 1.0$ )	78.05	28.3	195.0	103

Table 11. Ablation study of the number of neural assemblies (

N_{a s m}

) in FCR. The number of assemblies significantly affects both performance and computational efficiency, with 6 assemblies providing the best trade-off.

Table 11. Ablation study of the number of neural assemblies (

N_{a s m}

) in FCR. The number of assemblies significantly affects both performance and computational efficiency, with 6 assemblies providing the best trade-off.

Variant	mAP	#P	GFLOPs	FPS
$N_{a s m} = 2$	76.23	26.8	172.0	128
$N_{a s m} = 4$	78.45	27.6	185.0	115
$N_{a s m} = 6$ (Ours)	79.57	28.3	195.0	103
$N_{a s m} = 8$	79.63	29.5	210.0	92
$N_{a s m} = 10$	79.51	30.8	228.0	81

Table 12. Ablation study of temperature parameter (

τ

) in FCR. The temperature parameter controls assembly activation sparsity, with

τ = 0.5

providing optimal performance.

Table 12. Ablation study of temperature parameter (

τ

) in FCR. The temperature parameter controls assembly activation sparsity, with

τ = 0.5

providing optimal performance.

Variant	mAP	$S (G)$	GFLOPs
$τ = 0.1$	75.89	1.2	195.0
$τ = 0.3$	78.92	2.1	195.0
$τ = 0.5$ (Ours)	79.57	2.8	195.0
$τ = 0.7$	79.21	3.5	195.0
$τ = 1.0$	78.65	4.3	195.0

Table 13. Ablation study of structural compression factor (

ρ

) in CAB. The compression factor controls the pruning ratio, with

ρ = 0.7

providing the best accuracy–efficiency trade-off.

Table 13. Ablation study of structural compression factor (

ρ

) in CAB. The compression factor controls the pruning ratio, with

ρ = 0.7

providing the best accuracy–efficiency trade-off.

Variant	$R_{prune}$	mAP	#P	GFLOPs
$ρ = 0.5$	50%	74.36	22.1	142.0
$ρ = 0.6$	40%	76.89	24.5	163.0
$ρ = 0.7$ (Ours)	30%	79.57	28.3	195.0
$ρ = 0.8$	20%	79.62	33.7	238.0
$ρ = 0.9$	10%	79.70	39.2	281.0

Table 14. Ablation study of CAB internal sub-components. Each row isolates one structural element to quantify its independent contribution to accuracy and efficiency. The upper group ablates architectural design choices, while the lower group (below the mid-rule) ablates fundamental operators (

σ (\cdot)

and normalization).

Table 14. Ablation study of CAB internal sub-components. Each row isolates one structural element to quantify its independent contribution to accuracy and efficiency. The upper group ablates architectural design choices, while the lower group (below the mid-rule) ablates fundamental operators (

σ (\cdot)

and normalization).

Variant	mAP	#P	GFLOPs	FPS
Full CAB (Ours)	79.57	28.3	195.0	103
Standard $3 \times 3$ Conv (no bottleneck)	78.91	35.6	252.0	76
w/o Residual Connection	77.23	28.3	195.0	102
w/o Low-rank Projection ( $r = C$ )	79.14	33.1	231.0	84
Standard Conv (no DWConv)	79.71	35.2	267.0	91
ReLU activation (replace SiLU)	78.74	28.3	195.0	105
GELU activation (replace SiLU)	79.32	28.3	195.0	102
w/o BatchNorm	76.66	28.2	194.8	106
GroupNorm ( $G = 32$ , replace BN)	79.21	28.3	195.0	97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, L.; Zhong, S.; Gao, W.; Liu, J.; Xie, Y.; Hu, Y.; Ma, W.; Wei, Y.; Guo, Y. Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression. Remote Sens. 2026, 18, 1719. https://doi.org/10.3390/rs18111719

AMA Style

Tang L, Zhong S, Gao W, Liu J, Xie Y, Hu Y, Ma W, Wei Y, Guo Y. Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression. Remote Sensing. 2026; 18(11):1719. https://doi.org/10.3390/rs18111719

Chicago/Turabian Style

Tang, Lixing, Shaohong Zhong, Wentao Gao, Jialang Liu, Yuhang Xie, Yaowen Hu, Wanqi Ma, Yingmei Wei, and Yanming Guo. 2026. "Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression" Remote Sensing 18, no. 11: 1719. https://doi.org/10.3390/rs18111719

APA Style

Tang, L., Zhong, S., Gao, W., Liu, J., Xie, Y., Hu, Y., Ma, W., Wei, Y., & Guo, Y. (2026). Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression. Remote Sensing, 18(11), 1719. https://doi.org/10.3390/rs18111719

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Darwinian Wiring: A Connectome-Constrained Structural Plasticity Framework for Extreme Model Compression

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Lightweight Object Detection

2.2. Brain-Inspired and Connectomics-Inspired Methods in Object Detection

3. Methodology

3.1. Fitness Landscape and Selection Pressure

3.2. Phylogenetic Constraints: Constrained Manifold Flow in CAB

Discrete Approximation and Residual Learning

4. Experiments

4.1. Datasets

4.2. Experimental Settings

4.3. Main Results

4.4. Ablation Study

4.5. Visualization and Qualitative Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI