GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression

Han, Binbin; Tang, Qiang; Song, Jiuxu; Wang, Zheng; Yang, Yi

doi:10.3390/s26020368

Open AccessArticle

GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression

by

Binbin Han

^1,†,

Qiang Tang

^2,3,†,

Jiuxu Song

¹,

Zheng Wang

¹ and

Yi Yang

^1,*

¹

School of Electronic Engineering, Xi’an Shiyou University, Xi’an 710312, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Xi’an Institute of Optics and Precision Mechanics of CAS, Xi’an 710119, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2026, 26(2), 368; https://doi.org/10.3390/s26020368

Submission received: 20 November 2025 / Revised: 4 January 2026 / Accepted: 5 January 2026 / Published: 6 January 2026

(This article belongs to the Special Issue Advanced Deep Learning Techniques for Intelligent Sensor Systems)

Download

Browse Figures

Versions Notes

Abstract

Bounding box regression (BBR) loss plays a critical role in object detection within computer vision. Existing BBR loss functions are typically based on the Intersection over Union (IoU) between predicted and ground truth boxes. However, these methods neither account for the effect of predicted box scale on regression nor effectively address the drift problem inherent in BBR. To overcome these limitations, this paper introduces a novel BBR loss function, termed Gaussian Adaptive Ochiai BBR loss (GAOC), which combines the Ochiai Coefficient (OC) with a Gaussian Adaptive (GA) distribution. The OC component normalizes by the square root of the product of bounding box dimensions, ensuring scale invariance. Meanwhile, the GA distribution models the distance between the top-left and bottom-right corners (TL/BR) coordinates of predicted and ground truth boxes, enabling a similarity measure that reduces sensitivity to positional deviations. This design enhances detection robustness and accuracy. GAOC was integrated into YOLOv5 and RT-DETR and evaluated on the PASCAL VOC and MS COCO 2017 benchmarks. Experimental results demonstrate that GAOC consistently outperforms existing BBR loss functions, offering a more effective solution.

Keywords:

bounding box regression; ochiai coefficient; Gaussian adaptive distribution; object detection

1. Introduction

Object detection, a core computer vision task, is widely applied in autonomous driving, remote sensing, medical image analysis, and security surveillance [1,2]. Its objective is the simultaneous execution of object classification and localization. In this context, the bounding box regression (BBR) loss function is critical. It directly determines localization accuracy by measuring the alignment between predicted and ground truth boxes, thereby influences overall detection performance. Recent advances in deep learning have prompted the development of numerous Intersection over Union (IoU)-based loss functions [3], including GIoU, DIoU, CIoU, and EIoU. These functions mitigate issues such as gradient vanishing for non-overlapping boxes, center-point distance, and aspect ratio mismatch, thereby driving continual improvements in detection accuracy.

However, prevailing IoU-based loss functions suffer from two principal limitations. First, they frequently ignore the influence of bounding box scale on the regression process, resulting in suboptimal performance for small objects and in long-tail distributions. Second, these methods are often ineffective at mitigating the BBR drift problem. This problem arises when a significant positional deviation between the predicted and ground truth boxes prevents the loss function from providing sufficient gradient constraints. This inadequacy can cause slow convergence during initial training or unstable localization in later phases. Consequently, the development of a BBR loss function that ensures both scale invariance and robustness remains a pressing challenge in object detection research.

To overcome these limitations, this paper proposes a novel BBR loss function named Gaussian Adaptive Ochiai Loss (GAOC). GAOC integrates the properties of the Ochiai Coefficient (OC) and a Gaussian Adaptive (GA) distribution. The OC ensures scale invariance through normalization by the square root of the bounding box areas. Concurrently, the GA component models the top-left and bottom-right corners (TL/BR) coordinates with a Gaussian distribution, thereby reducing sensitivity to significant positional deviations. This combined approach enhances the loss function’s adaptability across object scales and improves its robustness and accuracy in complex detection scenarios. We integrated GAOC into the YOLOv5 [4] and RT-DETR [5], evaluate on PASCAL VOC [6,7] and MS COCO 2017 [8]. The results demonstrate that GAOC consistently surpasses existing IoU-based loss functions across multiple evaluation metrics, confirming its strong generality and effectiveness.

The primary contributions of this study are as follows:

We propose a novel BBR loss based on the Ochiai Coefficient (OC). Compared to traditional IoU loss, OC emphasizes the balance of bounding box dimensions while assigning greater weight to the intersection of the two boxes.
We introduce a Gaussian adaptive (GA) distribution for BBR loss, which improves robustness to positional variations by modeling TL/BR distances as a two-dimensional Gaussian distribution and computing similarity through GA.
We validate the effectiveness of GAOC on public datasets, where it outperforms other BBR losses across multiple benchmarks.

The remainder of this paper is organized as follows. Section 2 reviews representative IoU-family bounding box regression losses and discusses their limitations. Section 3 presents the proposed GAOC loss, including its formulation, optimization properties, and implementation details. Section 4 describes the experimental settings and reports comprehensive comparisons and ablation studies on benchmark datasets. Finally, Section 5 concludes this work and outlines future directions.

2. Related Work

2.1. Object Detection

Object detection is a fundamental task in computer vision, which involves identifying objects in images and accurately determining their locations and categories. R-CNN [9] was a pioneering approach in this field, introducing selective search to generate candidate object regions. These regions were then processed by Convolutional Neural Networks (CNNs) for feature extraction, followed by object classification using Support Vector Machines (SVMs).

Building on R-CNN, a series of improved variants have been developed, including Fast R-CNN [10], Faster R-CNN [11], and Mask R-CNN [12], each achieving significant gains in both efficiency and accuracy. In parallel, the SSD [13] and YOLO [4,14,15,16] families have become widely adopted due to their balance of speed and accuracy, enabling real-time object detection within an end-to-end framework. More recently, algorithms such as CornerNet [17], CenterNet [18], and FCOS [19] have emerged, advancing object detection through novel architectural designs and innovative methodologies.

The DETR [5] family of algorithms, distinguished by its attention mechanism and Transformer-based architecture, introduces a novel paradigm for object detection. Despite their diverse design principles and implementations, object detection algorithms share a common challenge in BBR, a critical step in accurate object localization. Precise bounding box prediction enables these models to localize objects reliably within images, thereby providing a robust foundation for subsequent analysis and processing.

2.2. Bounding Box Regression Losses

Existing mainstream BBR losses are primarily built upon the IoU [3], defined as:

I o U = \frac{| B \cap B^{g t} |}{| B \cup B^{g t} |},

(1)

where B and

B^{g t}

denote the predicted and ground truth boxes, respectively. The corresponding loss

L_{I o U}

is defined as:

L_{I o U} = 1 - I o U .

(2)

IoU-based losses have become the dominant methodology for BBR.

The GIoU loss [20] mitigates the problem of gradient vanishing during bounding box updates when the predicted boxes and the ground truth are non-overlapping. The GIoU loss is defined as:

G I o U = I o U - \frac{| C ∖ (B \cap B^{g t}) |}{| C |};

(3)

Here C represents the smallest convex hull that encloses both B and

B^{g t}

.

In contrast to GIoU, the DIoU [21] loss introduces an additional distance term into the IoU formulation, which minimizes the normalized distance between the centroids of the two bounding boxes, thereby enabling the algorithm to achieve faster convergence and improved performance. The DIoU is defined as:

D I o U = I o U - \frac{ρ^{2} (b, b^{g t})}{c^{2}},

(4)

where b and

b^{g t}

represent the centroids of B and

B^{g t}

, respectively. The term

ρ (\cdot)

represents the Euclidean distance, and c denotes the diagonal of the smallest enclosing bounding box.

The CIoU [21] loss extends DIoU by incorporating an additional shape loss term that accounts for aspect ratio consistency. The CIoU is formally defined as follows:

C I o U = I o U - \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v,

(5)

where

α

is the trade-off parameter given by:

α = \frac{v}{(1 - IoU) + v},

(6)

and v quantifies aspect ratio consistency:

v = \frac{4}{π^{2}} {(arctan \frac{w^{g t}}{h^{g t}} - arctan \frac{w}{h})}^{2} .

(7)

Here,

w^{g t}

and

h^{g t}

represent the width and height of the ground truth box, while w and h represent those of the predicted box. When the ground truth and predicted boxes share the same aspect ratio, CIoU simplifies to DIoU. The results of various losses are presented in Figure 1.

The EIoU [22] loss directly minimizes the normalized differences in width

(w, w^{g t})

, height

(h, h^{g t})

, and centroid position

(b, b^{g t})

between the predicted and ground truth boxes. The EIoU loss is defined as:

E I o U = I o U - \frac{ρ^{2} (b, b^{g t})}{c^{2}} + \frac{ρ^{2} (w, w^{g t})}{{(w^{c})}^{2}} + \frac{ρ^{2} (h, h^{g t})}{{(h^{c})}^{2}},

(8)

where

w^{c}

and

h^{c}

represent the width and height of the smallest enclosing bounding box that contains both the predicted and ground truth boxes.

SIoU [23] extends IoU-based BBR by adding an angle-aware guidance, together with distance and shape penalties, to stabilize optimization and improve localization. It is formulated as

S I o U = I o U - \frac{Δ + Ω}{2},

(9)

where

Δ

and

Ω

represent the distance and shape costs, respectively, and the distance term is reweighted by an angle-dependent factor.

When the width and height are identical, BBR optimization becomes infeasible. To address this issue and leverage the geometric properties of horizontal rectangles, a bounding box similarity metric, termed minimum-point distance IoU. MPDIoU was proposed [24] and is defined as follows:

M P D I o U = \frac{| A \cap B |}{| A \cup B |} - \frac{d_{1}^{2} + d_{2}^{2}}{w^{2} + h^{2}},

(10)

where

d_{1}

and

d_{2}

are the distances between the TL and BR corners, respectively, and w and h denote the width and height of the image.

The WIoU [25] loss is based on a dynamic non-monotonic focusing mechanism. This mechanism utilizes the outlier degree to evaluate the quality of predicted boxes and designs a gradient gain allocation strategy. The strategy reduces the dominance of high-quality predicted boxes while mitigating the adverse gradients from low-quality examples, thereby enabling WIoU to focus on medium-quality predicted boxes and enhance the detector’s overall performance. WIoU is defined as follows:

L_{W I o U} = R_{W I o U} \cdot I o U,

(11)

R_{W I o U} = exp (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{{(W_{g}^{2} + H_{g}^{2})}^{*}}),

(12)

where

R_{W I o U} \in [1, e)

significantly amplifies the

L_{I o U}

of ordinary-quality predicted boxes (

L_{I o U} \in [0, 1]

). When predicted box and the ground truth overlap well,

L_{I o U}

significantly reduces

R_{W I o U}

for high-quality predicted boxes, with a focus on the centroid distance. Here,

W_{g}

and

H_{g}

denote the dimensions of the minimum enclosing boxes.

2.3. Limitations

Despite substantial progress in IoU-based BBR losses, practical limitations remain. GIoU, DIoU, CIoU primarily improve overlap and center and aspect constraints, but can be less effective under large displacement, extreme aspect ratios, or long-tail scales. EIoU and SIoU further refine geometric penalties, yet introduce additional design complexity and potential sensitivity to hyperparameters. MPDIoU emphasizes point-wise geometry but does not explicitly improve scale robustness, while WIoU mainly reweights samples without changing the underlying overlap similarity, leaving limited guidance when overlap is low. Motivated by these gaps, we propose the GAOC, which enhances scale robustness via Ochiai-style normalization and strengthens geometric guidance through Gaussian-adaptive corner modeling for more stable regression under scale variation and large displacement.

3. Method

3.1. Simulation Experiment

This study employs a simulation experiment proposed in the CIoU [21] study to evaluate BBR performance under different loss conditions. The simulation generates seven target boxes (aspect ratios 1:4, 1:3, 1:2, 1:1, 2:1, 3:1, and 4:1; all with area

1 / 32

) centered at the coordinate

(0.5, 0.5)

. Twenty thousand anchor points are generated in a circular area of radius r, centered at the same location, and for each anchor point, 49 anchor boxes (aspect ratios

1 / 32

,

1 / 24

,

3 / 64

,

1 / 16

,

1 / 12

,

3 / 32

,

1 / 8

) are defined. Each anchor box is regressed toward the target box.

To compare convergence speeds at different stages, this study adopts the following experimental settings: for

r = 0.5

, anchor points are distributed both inside and outside the target box (Figure 2, left), representing the full range of BBR scenarios; for

r = 0.1

, anchor points lie within the target box (Figure 2, right), representing the primary BBR scenarios.

3.2. Ochiai Coefficient Loss

3.2.1. Loss Forward

To overcome the scale sensitivity of IoU-style normalization, this study proposes a novel Ochiai Coefficient (OC) loss, defined as follows. For each pixel

p (i, j)

in the image, the ground truth is represented as a four-dimensional vector:

x_{i, j}^{g t} = (x_{i j}^{t}, x_{i j}^{b}, x_{i j}^{l}, x_{i j}^{r}),

where

x_{t}

,

x_{b}

,

x_{l}

, and

x_{r}

represent the distances from pixel

(i, j)

to the top, bottom, left, and right boundaries of the image, respectively. The predicted box is similarly defined as

x = (x_{t}, x_{b}, x_{l}, x_{r})

, as illustrated in Figure 3.

For OC loss, given a predicted box

B = (x_{t}, x_{b}, x_{l}, x_{r})

and a ground truth

B^{g t}

, the loss is defined as:

O C = \frac{| B^{g t} \cap B |}{\sqrt{| B^{g t} | \times | B |}} .

(13)

In Algorithm 1,

p (i, j)

indicates whether pixel

(i, j)

lies within a valid target box.

x^{p}

and

x^{g t}

represent the areas of the predicted and ground truth boxes, respectively.

I_{h}

and

I_{w}

represent the height and width of the intersection region I. Since

0 \leq O C \leq 1

, we adopt the negative log-likelihood

L = - ln (O C)

. This formulation yields a scale invariant similarity that emphasizes the intersection while balancing box sizes, thereby facilitating more accurate bounding box prediction.

Moreover, this definition naturally normalizes OC to

[0, 1]

independently of the bounding box scale. OC loss emphasizes the weight of shared elements between boxes by considering the balance between predicted box and ground truth sizes. Finally, OC accounts for set size by normalizing with the square root of the product, which renders it invariant to set size.

Algorithm 1 OC loss Forward

1:: Input: $x^{g t}$ as ground truth, $x^{p}$ as predicted
2:: Output: L as localization error each pixel $(i, j)$ $x^{g t} \neq 0$
3:: $X = (x_{t} + x_{b}) \cdot (x_{l} + x_{r})$
4:: $\tilde{X} = ({\tilde{x}}_{t} + {\tilde{x}}_{b}) \cdot ({\tilde{x}}_{l} + {\tilde{x}}_{r})$
5:: $I_{h} = min (x_{t}, {\tilde{x}}_{t}) + min (x_{b}, {\tilde{x}}_{b})$
6:: $I_{w} = min (x_{l}, {\tilde{x}}_{l}) + min (x_{r}, {\tilde{x}}_{r})$
7:: $I = I_{h} \cdot I_{w}$
8:: $U = X \times \tilde{X}$
9:: $O C = \frac{I}{\sqrt{U}}$
10:: $L = - ln (O C)$
11:: if not valid then, $L = 0$

3.2.2. Loss Backward

For the OC loss backpropagation, the partial derivative of X with respect to x (denoted

\nabla_{x} X

, where

x \in x_{t}, x_{b}, x_{l}, x_{r}

) is computed first:

\frac{\partial X}{\partial x_{t}; (or \partial x_{b})} = x_{l} + x_{r},

(14)

\frac{\partial X}{\partial x_{l}; (or \partial x_{r})} = x_{t} + x_{b} .

(15)

Next, the partial derivative of I with respect to x (denoted

\nabla_{x} I

) is derived:

\frac{\partial I}{\partial x_{t}; (or \partial x_{b})} = \{\begin{matrix} I_{w}, & if x_{t} < x_{t}^{gt} or x_{b} < x_{b}^{gt}, \\ 0, & otherwise, \end{matrix}

(16)

\frac{\partial I}{\partial x_{l}; (or \partial x_{r})} = \{\begin{matrix} I_{h}, & if x_{l} < x_{l}^{gt} or x_{r} < x_{r}^{gt}, \\ 0, & otherwise . \end{matrix}

(17)

Finally, the gradient of the OC localization loss

L

with respect to x is given by:

\frac{\partial L}{\partial x} = \frac{I \nabla_{x} X}{U^{2} OC} = \frac{\nabla_{x} X}{2 U} - \frac{\nabla_{x} I}{I} .

(18)

The OC loss mechanism is best understood through its mathematical formulation:

\nabla_{x} X

represents the penalty term associated with the predicted box, which is directly proportional to the loss gradient;

\nabla_{x} I

represents the penalty term associated with the intersection region, which is inversely proportional to the loss gradient. Consequently, minimizing the OC loss requires maximizing the intersection region while simultaneously minimizing the predicted box volume. The limiting case arises when the intersection region equals the predicted box, yielding perfect bounding box alignment.

3.3. Gaussian Adaptive Loss

During training, the predicted box

B = (x, y, w, h)

in Algorithm 2 is optimized to minimise the loss with respect to the ground truth

G = (x_{1}, y_{1}, w_{1}, h_{1})

. Specifically, the TL and BR coordinates of the predicted box B are obtained through a transformation

(x_{t}, y_{t}, x_{b}, y_{b})

, while the ground truth

G^{g t}

is defined by the coordinates

(x_{t}^{g t}, y_{t}^{g t}, x_{b}^{g t}, y_{b}^{g t})

. Predicted boxes are clustered around the ground truth, with the distance from each predicted box to the ground truth modelled as a Gaussian distribution (GA), as illustrated in Figure 4.

Algorithm 2 GAOC as Bounding Box Loss

1:: Input Predicted $B^{p} = (x_{t}^{p}, y_{t}^{p}, x_{b}^{p}, y_{b}^{p})$ , ground truth $B^{g t} = (x_{t}^{g t}, y_{t}^{g t}, x_{b}^{g t}, y_{b}^{g t})$ , width and height of input image: $w, h$ .
2:: Output $L_{G A O C}$
3:: For the predicted box $B^{p}$ , ensuring $x_{t}^{p} < x_{b}^{p}$ and $y_{t}^{p} < y_{b}^{p}$ .
4:: $w_{t}^{2} = {(x_{t}^{p} - x_{t}^{g t})}^{2} + {(y_{t}^{p} - y_{t}^{g t})}^{2}$
5:: $w_{b}^{2} = {(x_{b}^{p} - x_{b}^{g t})}^{2} + {(y_{b}^{p} - y_{b}^{g t})}^{2}$
6:: $G_{t} = exp [- \frac{1}{2} (\frac{{(x_{t}^{p} - x_{t}^{g t})}^{2}}{2} + \frac{{(y_{t}^{p} - y_{t}^{g t})}^{2}}{2})]$
7:: $G_{b} = exp [- \frac{1}{2} (\frac{{(x_{b}^{p} - x_{b}^{g t})}^{2}}{2} + \frac{{(y_{b}^{p} - y_{b}^{g t})}^{2}}{2})]$
8:: Calculating area of $B^{g t} : A^{g t} = (x_{b}^{g t} - x_{t}^{g t}) \cdot (y_{b}^{g t} - y_{t}^{g t})$
9:: Calculating area of $B^{p r d} : A^{p} = (x_{b}^{p} - x_{t}^{p}) \cdot (y_{b}^{p} - y_{t}^{p})$
10:: Calculating intersection I between $B^{p r d}$ and $B^{g t}$ :
11:: $x_{1}^{I} = max (x_{t}^{p}, x_{t}^{g t}), x_{2}^{I} = min (x_{b}^{p}, x_{b}^{g t})$
12:: $y_{1}^{I} = max (y_{t}^{p}, y_{t}^{g t}), y_{2}^{I} = min (y_{b}^{p}, y_{b}^{g t})$
13:: $I = \{\begin{cases} (x_{2}^{I} - x_{1}^{I}) \cdot (y_{2}^{I} - y_{1}^{I}), & if x_{2}^{I} > x_{1}^{I}, y_{2}^{I} > y_{1}^{I} \\ 0, & otherwise \end{cases}$
14:: $O C = \frac{I}{\sqrt{B^{g t} \times B^{p}}}$
15:: $G A = w_{b}^{2} G_{b} + w_{t}^{2} G_{t}$
16:: $G A O C = O C - G A$
17:: $L_{G A O C} = 1 - G A O C$

Consider a two-dimensional Gaussian distribution for the top-left corner coordinates of the predicted and ground truth boxes, denoted

G_{t} \sim N (μ_{1}, μ_{2}, σ_{1}, σ_{2}, ρ)

, where the mean vector is

μ_{t} = [\begin{matrix} μ_{1} \\ μ_{2} \end{matrix}] = [\begin{matrix} x_{t}^{g t} \\ y_{t}^{g t} \end{matrix}]

The vectors

(x, y)

are mutually orthogonal, thus the correlation coefficient is

ρ = 0

. Treating the x-axis and y-axis distances as equivalent, the variance matrix

Σ_{t}

is given by:

Σ_{t} = [\begin{matrix} σ_{1}^{2} & 0 \\ 0 & σ_{2}^{2} \end{matrix}] = [\begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix}] .

(19)

The two-dimensional Gaussian distribution for distance is:

G_{t} = {(2 π σ_{1} σ_{2})}^{- 1} exp [- \frac{1}{2} (\frac{{(x_{t}^{g t} - x_{t})}^{2}}{σ_{1}^{2}} + \frac{{(y_{t}^{g t} - y_{t})}^{2}}{σ_{2}^{2}})],

(20)

w_{t}^{2} = {|x_{t}^{g t} - x_{t}|}_{2}^{2} + {|y_{t}^{g t} - y_{t}|}_{2}^{2},

(21)

where

w_{t}^{2}

is the squared Euclidean distance between the TL corners of the predicted and ground truth boxes.

Similarly, the squared Euclidean distance between the BR corners of the predicted and ground truth boxes is:

w_{b}^{2} = {|x_{b}^{g t} - x_{b}|}_{2}^{2} + {|y_{b}^{g t} - y_{b}|}_{2}^{2},

(22)

with the corresponding Gaussian distribution:

G_{b} = {(2 π σ_{3} σ_{4})}^{- 1} exp [- \frac{1}{2} (\frac{{(x_{b}^{g t} - x_{b})}^{2}}{σ_{3}^{2}} + \frac{{(y_{b}^{g t} - y_{b})}^{2}}{σ_{4}^{2}})] .

(23)

Here, the values of

σ_{3}^{2}

and

σ_{4}^{2}

are both set to 2, yielding the GA loss:

G A = w_{t}^{2} G_{t} + w_{b}^{2} G_{b} .

(24)

The backward of the GA term can be expressed as:

\frac{\partial G A}{\partial (x, y)} = \frac{\partial w_{t}^{2} G_{t}}{\partial (x_{t}, y_{t})} + \frac{\partial w_{b}^{2} G_{b}}{\partial (x_{b}, y_{b})} = w_{t}^{2} exp (1 - \frac{w_{t}^{2}}{2 σ^{2}}) + w_{b}^{2} exp (1 - \frac{w_{b}^{2}}{2 σ^{2}})

(25)

The GAOC loss is formulated as:

G A O C = O C - G A .

(26)

Thus, the

L_{G A O C}

loss is expressed as:

L_{G A O C} = 1 - G A O C .

(27)

4. Experiments

The objective of this study is to enhance the ability of object detection algorithms to accurately identify objects of diverse sizes and shapes. To this end, we propose the Gaussian Adaptive BBR Loss (GAOC), a novel loss function specifically designed to optimize localisation and classification in object detection. By integrating GAOC into the YOLOv5 and RT-DETR object detection frameworks and conducting extensive experiments on the PASCAL VOC and MS COCO 2017 benchmark datasets, we demonstrate that GAOC achieves superior performance in multi-scale object detection tasks.

4.1. MS COCO 2017 and PASCAL VOC

The Common Objects in Context (COCO) dataset is a large-scale benchmark developed by Microsoft in collaboration with research institutions. It contains over 330,000 images featuring diverse objects and complex backgrounds and is widely used for computer vision tasks such as object detection, semantic segmentation, and instance-level annotation. Each image is annotated with precise bounding boxes, pixel-level segmentation masks, and corresponding semantic labels. The COCO dataset defines 80 object categories, covering common classes such as humans, animals, and vehicles, while also providing scene-level annotations for background contexts. In the COCO 2017 release, the Train2017 subset (118,287 images) is used for model training, the Val2017 subset (5000 images) for validation, and the Test2017 subset (20,288 images) for performance evaluation and benchmarking.

The PASCAL Visual Object Classes (PASCAL VOC) dataset is a widely used benchmark for object detection, image classification, and semantic segmentation. In this study, the VOC2007 and VOC2012 datasets are integrated to form a combined training set of 21,503 images and a test set of 4952 images, covering a total of 20 object categories.

4.2. Experimental Setup

The experimental setup of this study is described as follows. Owing to their larger network architectures and higher parameter counts, YOLOv5X and RT-DETR were applied to complex scenarios in the COCO dataset, whereas YOLOv5L was selected for the PASCAL VOC dataset. Ablation experiments were conducted to assess the performance of RT-DETR on the PASCAL VOC dataset. Training employed stochastic gradient descent (SGD) as the optimizer, with a learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005. Data augmentation techniques included random flipping, rotation, translation, mosaic stitching, and image blending. The label smoothing factor was set to 0.2, and the input image size was fixed at 640 × 640 pixels.

4.3. Experimental Analysis

As shown in Figure 5a, under identical initialization conditions, both IoU and OC losses decrease monotonically as training iterations increase, and they reach a plateau after about 50 to 70 epochs, reflecting the stability of the overall optimization process. However, the two curves ultimately converge to loss levels of 0.804 and 0.788, indicating that their localization performance remains limited. By contrast, the OC loss demonstrates a faster descent and a lower convergence value, implying that in cases of minor overlaps or partial scale mismatches, OC provides smoother gradients and thus achieves more effective loss suppression. These results reveal that relying solely on intersection-based metrics (IoU or OC) is insufficient to provide adequate geometric guidance in scenarios with large displacements or significant shape differences.

Based on the results in Figure 5b,c and Table 1, we systematically compared multiple BBR loss functions. As shown in Figure 5b,c, all methods exhibit a stable downward trend in their training curves. However, they differ significantly in convergence speed and final residual values. In terms of curve morphology, WIoU, SIoU, and GAOC decline most rapidly within the 40 to 60 epochs and quickly enter a low loss plateau. GIoU converges the slowest and retains the highest residual error. EIoU descends faster in the mid-to-late training phase but still ends with relatively high residuals. A closer comparison between GAOC (GA + OC) and GAI (GA + IoU) shows that both follow a smooth, S-shaped convergence trajectory. GAOC achieves a lower final value and a more stable plateau. This suggests that it provides effective gradient guidance in both the early geometric alignment and the later fine-tuning stages of localization.

From Table 1,

M e a n

represents the overall average localization quality, whereas

M i n

reflects the lower bound performance on the most challenging samples. Higher values indicate greater robustness. GAOC achieves the best results on both

M e a n

(0.963) and

M i n

(0.956). This demonstrates improvements not only in overall accuracy but also in worst-case long tail performance. GAI (0.959/0.953) yields comparable results. WIoU achieves a higher

M i n

(0.950) than MPDIoU (0.942), indicating stronger robustness. DIoU and CIoU perform at moderate levels. EIoU shows competitive average performance (0.958) but a very low

M i n

(0.51), revealing high vulnerability to long tail cases. GIoU performs the weakest across both metrics (0.872/0.571).

Mechanistically, GAOC and GAI integrate overlap measures with Gaussian corner-based geometric constraints. These constraints yield nonvanishing, directionally informative gradients even under zero overlap and weak overlap conditions. This design enables fast convergence, low residual loss, and strong long tail robustness. WIoU mitigates noisy gradient effects through hard sample adaptive weighting, thereby stabilizing its tail performance. In contrast, EIoU suffers from gradient imbalance when dealing with extreme shapes and large displacements. This results in degraded tail performance. In summary, GAOC achieves the best overall performance in this simulation study, excelling in convergence speed, final accuracy, and long tail robustness.

Table 2 and Table 3 present the experimental results on the COCO 2017 and PASCAL VOC datasets, respectively. A comparison of different BBR losses (including CIoU, DIoU, EIoU, GIoU, SIoU, WIoU, and MPDIoU) against GAOC yields the following conclusions. On the COCO 2017 dataset, GAOC outperformed all competing methods in mAP, mAP75, and mAP50. Specifically, GAOC achieved an mAP of 46.2%, representing a 1.6% improvement over the best-performing competitor, WIoU (44.6%), thereby demonstrating a clear advantage in detection accuracy. GAOC also achieved an mAP50 of 64.5%, corresponding to a 3.2% increase over WIoU and substantial improvements compared to other methods.

On the PASCAL VOC dataset, GAOC achieved an mAP50 of 79.0%, further confirming its superior performance. The consistent improvements across both datasets highlight GAOC’s robust generalization capability: it sustains high detection accuracy in the complex, real-world scenes of COCO 2017 as well as in the standardized image settings of PASCAL VOC. Figure 6 and Figure 7 illustrate qualitative detection results, further supporting the superior performance of GAOC compared with alternative loss functions.

Table 4 presents the performance of various BBR losses functions on the MS COCO dataset using RT-DETR as the baseline, with GAOC demonstrating clear advantages across key evaluation metrics. For the mAP50 metric, GAOC achieved the highest score of 65.3%, surpassing the second-ranked CIoU (64.9%) by 0.4%. This result indicates that GAOC delivers superior object detection accuracy under relaxed matching criteria. GAOC also maintained the leading position in mAP, achieving a score of 47.8%. Overall, in experiments on the MS COCO dataset with RT-DETR as the benchmark, GAOC consistently outperformed other mainstream BBR losses functions (including CIoU, DIoU, and EIoU) across both mAP50 and mAP metrics. These findings confirm that GAOC more effectively optimizes BBR in object detection, improves detection accuracy, and provides superior overall performance.

The design philosophy of GAOC is grounded in a deep understanding of the intrinsic characteristics of the BBR problem. By introducing a falloff coefficient, GAOC assigns greater weight to the intersection between predicted and ground truth boxes in the loss function, thereby improving detection performance for multi-scale objects. In addition, the incorporation of a Gaussian adaptive mechanism increases the algorithm’s robustness to variations in target position by modeling the TL/BR coordinates of bounding boxes as a two-dimensional Gaussian distribution.

Synthesizing the experimental results with the methodological analysis, we conclude that the GAOC loss function demonstrates outstanding performance in object detection tasks. The combination of its innovative design and consistent empirical outcomes validates GAOC as a novel BBR loss function with significant potential to advance the field of object detection. Future research could explore applying GAOC to diverse scenarios and tasks, as well as integrating it with other state-of-the-art (SOTA) algorithms.

4.4. Ablation Study

When comparing the OC and IoU loss functions, as shown in Table 5, OC consistently outperforms IoU across both mAP50 and mAP. By normalizing the square root of bounding box dimensions’ product, OC achieves scale invariance and enables more accurate similarity measurement between predicted and ground truth boxes, thereby delivering superior performance in object detection tasks.

The comparison between GAI and GAOC shows that GAOC surpasses GAI by 1.2% and 0.5% in mAP50 and mAP, respectively. This result validates the rationale for integrating GA with OC. The GA mechanism reduces sensitivity to positional deviations by modeling the TL/BR coordinates of bounding boxes as a two-dimensional Gaussian distribution. The scale invariance of OC further complements this approach. In contrast, GAI’s reliance on IoU limits its optimization capacity, and even when combined with GA, it fails to match the comprehensive performance of GAOC. These findings suggest that GAOC adopts a more effective design strategy for BBR optimization, better addressing the challenges of complex scenes and multi-scale object detection.

5. Conclusions

This study proposed the GAOC loss function. Experimental results on the COCO 2017 and PASCAL VOC benchmark datasets demonstrated that GAOC delivers significant performance improvements in object detection tasks. GAOC enhances the detector’s capability for multi-scale object detection and reduces sensitivity to positional bias. It achieves this by assigning greater weight to the intersection of predicted and ground truth boxes and implementing point-to-point coordinate alignment. These experiments not only validate the effectiveness of GAOC but also highlight its strong potential for practical applications. Future work may extend GAOC to related domains, such as instance segmentation.

Author Contributions

B.H. and Q.T. contributed to methodology design, algorithm development, and manuscript writing and editing. J.S., Z.W. and Y.Y. provided financial support, laboratory resources, and experimental guidance. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shaanxi Natural Science Basic Research Program (Grant No.: S2024-JC-YB-2227 and 2024JC-YBQN-0575) and Xi’an Shiyou University, School of Electronic Engineering.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used in this study are publicly available and ethically compliant. There are no competing interests associated with the data. The code used and analyzed during the current study is available from the corresponding or first author upon reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

References

Tang, Q.; Su, C.; Tian, Y.; Zhao, S.; Yang, K.; Hao, W.; Feng, X.; Xie, M. YOLO-SS: Optimizing YOLO for enhanced small object detection in remote sensing imagery. J. Supercomput. 2025, 81, 303. [Google Scholar] [CrossRef]
Xie, M.; Tang, Q.; Tian, Y.; Feng, X.; Shi, H.; Hao, W. DCN-YOLO: A Small-Object Detection Paradigm for Remote Sensing Imagery Leveraging Dilated Convolutional Networks. Sensors 2025, 25, 2241. [Google Scholar] [CrossRef]
Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
Jocher, G. Ultralytics YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 4 January 2026).
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1922–1933. [Google Scholar] [CrossRef] [PubMed]
Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar] [CrossRef]
Ma, S.; Xu, Y. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar] [CrossRef]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]

Figure 1. Regression results under various BBR loss functions indicate that GAOC achieves the best overall performance.

Figure 2. Simulation experiments with anchor points (blue) and object boxes (purple). (left) All cases. (right) Major cases.

Figure 3. Illustration of OC loss for pixel-wise predicted box.

Figure 4. The process of GA. In GA, we directly calculate the GA distance between predicted box and ground truth.

Figure 5. (a): IoU and OC loss training for 150 epochs, loss regression. (b): Regression curve of the GAOC and other BBR losses. (c): Regression curve of the GAOC (OC with GA) and GAI (IoU with GA).

Figure 6. Inference performance comparison of various BBR losses on COCO.

Figure 7. Inference performance comparison of various BBR losses on PASCAL VOC.

Table 1. Performance of each bounding box loss.

	$Mean$	$Min$
$W I o U$	0.958	0.950
$S I o U$	0.956	0.944
$C I o U$	0.946	0.848
$D I o U$	0.948	0.869
$G I o U$	0.872	0.571
$M P D I o U$	0.958	0.942
$E I o U$	0.958	0.51
$G A I$	0.959	0.953
$G A O C$	0.963	0.956

Table 2. Performance of each bounding box loss on the COCO with YOLOv5X as the baseline.

	mAP	mAP75	mAP50	mAP_s	mAP_m	mAP_l
$C I o U$	43.1	46.6	60.8	26.5	47.9	56.0
$D I o U$	42.6	46.1	60.0	25.8	47.3	56.1
$E I o U$	44.1	48.1	60.7	27.4	50.2	57.2
$G I o U$	42.4	45.9	59.7	26.2	46.9	55.7
$S I o U$	42.5	46.6	60.0	26.1	47.1	55.5
$W I o U$	44.6	48.5	61.4	28.1	50.0	58.3
$M P D I o U$	44.5	48.5	61.2	27.8	50.4	57.4
$G A O C$	46.2	50.0	64.5	28.5	40.8	60.6

Table 3. Performance of each bounding box loss on the PASCAL VOC with YOLOv5L as the baseline.

	mAP	mAP75	mAP50
$C I o U$	59.4	65.4	78.3
$D I o U$	59.5	65.6	78.1
$E I o U$	59.5	65.4	78.6
$G I o U$	59.2	64.9	78.1
$S I o U$	59.7	65.7	78.4
$W I o U$	59.6	65.3	78.4
$M P D I o U$	59.5	65.2	78.5
$G A O C$	60.2	66.2	79.3

Table 4. Performance of each bounding box loss on the MS COCO with RT-DETR as the baseline.

	mAP50	mAP
$C I o U$	64.9	47.4
$D I o U$	64.3	47.0
$E I o U$	61.1	44.3
$G I o U$	64.7	47.3
$S I o U$	64.1	46.7
$W I o U$	63.6	46.4
$M P D I o U$	63.8	45.9
$G A O C$	65.3	47.8

Table 5. The ablation experiments were conducted on the PASCAL VOC dataset using RT-DETR as the baseline model to evaluate the contribution of each loss component.

	mAP50	mAP
$I o U$	70.9	52.2
$O C$	71.7	52.4
$G A I$	72.6	53.1
$G A O C$	73.8	53.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, B.; Tang, Q.; Song, J.; Wang, Z.; Yang, Y. GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression. Sensors 2026, 26, 368. https://doi.org/10.3390/s26020368

AMA Style

Han B, Tang Q, Song J, Wang Z, Yang Y. GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression. Sensors. 2026; 26(2):368. https://doi.org/10.3390/s26020368

Chicago/Turabian Style

Han, Binbin, Qiang Tang, Jiuxu Song, Zheng Wang, and Yi Yang. 2026. "GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression" Sensors 26, no. 2: 368. https://doi.org/10.3390/s26020368

APA Style

Han, B., Tang, Q., Song, J., Wang, Z., & Yang, Y. (2026). GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression. Sensors, 26(2), 368. https://doi.org/10.3390/s26020368

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GAOC: A Gaussian Adaptive Ochiai Loss for Bounding Box Regression

Abstract

1. Introduction

2. Related Work

2.1. Object Detection

2.2. Bounding Box Regression Losses

2.3. Limitations

3. Method

3.1. Simulation Experiment

3.2. Ochiai Coefficient Loss

3.2.1. Loss Forward

3.2.2. Loss Backward

3.3. Gaussian Adaptive Loss

4. Experiments

4.1. MS COCO 2017 and PASCAL VOC

4.2. Experimental Setup

4.3. Experimental Analysis

4.4. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI