Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN

Qian, Yuwen; Liu, Song; Zhang, Nannan; Chen, Yuhua; Chen, Zhanpeng; Li, Mu

doi:10.3390/rs18020229

Open AccessArticle

Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN

by

Yuwen Qian

¹,

Song Liu

²,

Nannan Zhang

²,

Yuhua Chen

^1,*,

Zhanpeng Chen

¹ and

Mu Li

¹

School of Resources and Geosciences, China University of Mining and Technology, Xuzhou 221116, China

²

Research Institute of Petroleum Exploration and Development, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(2), 229; https://doi.org/10.3390/rs18020229

Submission received: 7 December 2025 / Revised: 7 January 2026 / Accepted: 8 January 2026 / Published: 10 January 2026

(This article belongs to the Special Issue Artificial Intelligence-Driven Methods for Remote Sensing Target and Object Detection (Third Edition))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Proposed OGF Oriented R-CNN, an enhanced oriented detector tailored for oil and gas facilities, integrating O&G Loss Function, CAHEM (Class-Aware, Hard-Example Mining), and FPNFFA (Feature Pyramid Network with Feature Enhancement Attention) to synergistically address severe class imbalance, extreme scale variation, and arbitrary orientation in high-resolution remote sensing images.
Achieved a mean average precision (mAP) of 82.9% on a challenging dataset of 3039 high-resolution images with rotated bounding box annotations across three classes (well sites: 3006; industrial and mining lands: 692; drilling: 244).

What are the implications of the main finding?

Outperforms five state-of-the-art models by up to 27.6 percentage points and the baseline Oriented R-CNN by +10.5 percentage points, offering a reliable solution for real-world monitoring of oil and gas infrastructure in complex remote sensing scenarios.
Establishes a new benchmark dataset and a synergistic detection framework, contributing to the advancement of oriented object detection in energy resource management, environmental surveillance, and related remote sensing applications.

Abstract

Accurate detection of oil and gas (O&G) facilities in high-resolution remote sensing imagery is critical for infrastructure surveillance and sustainable resource management, yet conventional detectors struggle with severe class imbalance, extreme scale variation, and arbitrary orientation. In this work, we propose OGF Oriented R-CNN (Oil and Gas Facility Detection Oriented Region-based Convolutional Neural Network), an enhanced oriented detection model derived from Oriented R-CNN that integrates three improvements: (1) O&G Loss Function, (2) Class-Aware Hard Example Mining (CAHEM) module, and (3) Feature Pyramid Network with Feature Enhancement Attention (FPNFEA). Working in synergy, they resolve the coupled challenges more effectively than any standalone fix and do so without relying on rigid one-to-one matching between modules and individual issues. Evaluated on the O&G facility dataset comprising 3039 high-resolution images annotated with rotated bounding boxes across three classes (well sites: 3006, industrial and mining lands: 692, drilling: 244), OGF Oriented R-CNN achieves a mean average precision (mAP) of 82.9%, outperforming seven state-of-the-art (SOTA) models by margins of up to 27.6 percentage points (pp) and delivering a cumulative gain of +10.5 pp over Oriented R-CNN.

Keywords:

oriented object detection; oil and gas facilities; OGF Oriented R-CNN; high-resolution satellite image

1. Introduction

1.1. Background

O&G, as cardinal pillars of the global energy system play a crucial role in maintaining economic stability and fostering social progress. According to the International Energy Agency (IEA), global oil demand will still reach 106 million barrels per day by the end of this decade, despite the increasing use of renewable energy [1]. Against this backdrop, rational development of O&G resources and efficient management of O&G facilities (Figure 1) are essential for attaining sustainable development and tackling energy security challenges. Therefore, the precise and effective identification of O&G facilities becomes a critical imperative for the energy industry. Traditional methods for detecting such facilities rely heavily on field surveys and manual interpretation. Although these methods can achieve substantial accuracy, they suffer from drawbacks such as restricted coverage, high cost, and low efficiency, failing to fulfill the demands of large-scale O&G resource management.

Remote sensing satellite technology and artificial intelligence algorithms have steadily advanced in recent years. With their versatility, these cutting-edge technologies transcend industry barriers. They not only revolutionize the detection methods in the O&G sector but also pave the way for new applications in related fields such as ships and buildings, demonstrating particularly outstanding potential. Building on this trend, Xiong et al. [2] developed an object detection algorithm based on subtask attention. They split the object detection task into multiple subtasks by constructing subtask-specific attention modules, thereby improving feature extraction for small targets and detection performance in remote sensing images. Ramachandran et al. [3] created a deep learning framework to identify O&G well pads and storage tanks in satellite imagery, employing RetinaNet with Residual Network 50 (ResNet-50) and EfficientNet-B3 in a two-stage process for precise well pad detection, and Faster R-CNN with Res2Net for robust storage tank identification. Validated across multiple basins, this approach demonstrated excellent performance and strong generalization capabilities. On a related front, Zhang et al. [4] proposed the PARE-YOLO algorithm, redesigning the neck network of YOLOv8 and incorporating a lightweight detection head with multi-scale attention fusion, which significantly improved the robustness of detecting ground objects in complex aerial scenarios. Sun et al. [5] put forward DKETFormer, an innovative approach that leveraged a Transformer backbone to encode global dependencies. Simultaneously, it employed the Cross-spatial Knowledge Extraction Module (CKEM) and Inter-layer Feature Transfer Module (IFTM) for discriminative feature extraction and transfer, far outperforming traditional CNN-based methods.

Although the studies above collectively demonstrate substantial advances in remote sensing object detection, they also reveal a persistent gap between general-purpose detection frameworks and the specialized requirements of O&G facility recognition. The integration of remote sensing imagery and deep learning showcases remarkable potential in O&G facility detection, yielding globally prominent application achievements. The extensive utilization of high-resolution satellite imagery, coupled with continuous advancements in deep learning algorithms, substantially bolsters detection accuracy and environmental adaptability [6], enabling the effective identification of complex ground targets under diverse geomorphological and climatic conditions, thereby providing critical support for O&G infrastructure monitoring. Current research on O&G facility detection predominantly uses horizontal bounding-box object detectors, yet the inherent angular characteristics of facilities such as well sites make it difficult for such approaches to precisely capture their geometric configurations, often compromising localization accuracy. Furthermore, a pronounced imbalance exists in the quantities of various O&G facilities globally [7,8], skewing model training toward prevalent categories, undermining detection performance for less abundant or smaller targets. Hampered by these constraints, detection frameworks struggle to maintain robustness, and their functionality in comprehensive surveillance and environmental assessments is curtailed. In this context, recent transformer-based oriented detectors demonstrate strong capability in modeling global dependencies, but their reliance on data-hungry self-attention and high computational complexity limits their effectiveness under the challenging conditions of O&G facility detection. To address the aforementioned deficiencies, this paper proposes OGF Oriented R-CNN, a novel oriented detection model specifically tailored for O&G facilities in high-resolution remote sensing images.

The main contributions of this article are as follows:

Development of OGF Oriented R-CNN, a reliable detection model designed to accommodate the inherent class imbalance, scale disparity, and rotational variance in O&G facility detection, yielding accurate results in high-resolution remote sensing images.
Creation of a rotated-bounding-box annotated dataset for O&G facilities, serving as a benchmark for oriented object detection and enabling future research.

1.2. Related Work

1.2.1. Object Detection Algorithms Based on Deep Learning

Object detection in remote sensing images, by leveraging the powerful feature extraction and pattern recognition aptitudes of deep learning, markedly boosts the exactitude and efficiency of image analysis, particularly in handling high-dimensional and complex datasets. Research often distinguishes between horizontal bounding box detection, which is centered around identifying regular targets, and oriented bounding box detection, which is designed to handle the geometric complexities of oriented objects. Both propel the progress of remote sensing object detection, thus meeting a wide range of requirements.

Early investigations in this field predominantly utilized horizontal bounding box methods. Pei et al. [9] introduced SGD-YOLOv5, a refined YOLOv5 model with a depth-to-space convolution module, a global attention mechanism, and a decoupled head to improve performance on public datasets. Proposed by Li et al. [10], the two-stage Coarse-to-Fine Decoupling (CFD) R-CNN, characterized by its coarse-to-fine decoupling strategy, feature map upsampling, and high-resolution cropping, optimized small object detection efficiency while maintaining low computational overhead. Cao et al. [11] proposed Remote Sensing Detection Transformer (RS-DETR) augmented by Global Attention Mechanism (GAM) and Scale-invariant Intersection over Union (SIoU) loss, which melds a Swin Transformer with a dual-branch design, refining small object precision and dense cluster localization in high-resolution remote sensing.

Unlike horizontal bounding box methods, which may falter with oriented targets, oriented object detection enhances localization by explicitly accounting for object orientation. Han et al. [12] presented Rotation-equivariant Detector (ReDet) with a rotation-equivariant network and a Rotation-invariant RoI Align (RiRoI Align) module, achieving robust detection of arbitrarily oriented objects by extracting rotation-invariant features from equivariant representations in both spatial and orientation dimensions, particularly effective for ship detection in datasets like HRSC2016. Given CNNs’ limitations in handling orientation variations, Zheng et al. [13] developed the instance-aware Spatial-Frequency Feature Fusion detector (SFFD) for oriented object detection in remote sensing, which innovatively combined a Layer-wise Frequency-domain Analysis (L-FDA) module with CNNs, followed by an instance-aware Cross-Feature Fusion (CFF) module, validated to substantially bolster detection performance. Zhao et al. [14] developed OrientedFormer, an end-to-end transformer detector that incorporated specialized positional encoding and attention mechanisms to manage multi-directional objects in aerial imagery.

Drawing on recent anchor-free approaches to oriented object localization, Li et al. [15] proposed Feature Augmentation and Alignment for Anchor-free Oriented Object Detection (FAA-Det), which employs a Feature Augmentation Module (FAM) and an Oriented Feature Alignment (OFA) module to enrich target representations and harmonize classification and regression tasks, adeptly managing dense and multi-scale remote sensing scenarios with excellent results.

1.2.2. Multi-Source High-Resolution Satellite Imagery for Object Detection

Multi-source high-resolution satellite imagery, distinguished by sub-meter resolution and a rich array of imaging modalities, constitutes the bedrock of advanced object detection in intricate geographic environments. The BeiJing-2 satellite capitalizes on temporal acuity to monitor evolving targets, the BeiJing-3 satellite deploys stereoscopic prowess to fine-tune the resolution of subtle features in challenging topographies, and the GaoFen-2 satellite sustains exceptional imaging fidelity for precise target delineation. This combination of multi-source data sharpens detection accuracy across diverse settings, thus impacting recent methodological advances. Song et al. [16] amalgamated multi-temporal high-resolution imagery from Worldview-3 and GaoFen-2, complemented by Sentinel-2 data, to improve urban water body extraction and bolster resilience across seasonal and contextual shifts. Fang et al. [17] synthesized a dataset including high-resolution optical imagery from Worldview-2, Worldview-3, and GaoFen-2 to develop Swin-HSTPS, which excelled in discerning traffic port stations through multi-scale feature integration.

1.2.3. Oil and Gas Facility Detection Using Remote Sensing

Underpinning both energy resource management and environmental oversight, the detection of O&G facilities necessitates a sophisticated integration of advanced remote sensing technologies. Remote sensing, by providing high-resolution, multi-temporal imagery across extensive and heterogeneous landscapes, enables precise localization and continuous monitoring of O&G facilities. By virtue of spectral and textural feature extraction from remote sensing imagery, machine learning methods attained remarkable success in early O&G facility detection. Aljameel et al. [18] compared five machine learning algorithms for pipeline anomaly detection, with support vector machine achieving an accuracy of 97.43%, though limited by predefined features in complex remote sensing contexts. Despite laying the groundwork, machine learning methods, which exhibit over-reliance on feature engineering and an inability to handle high-dimensional data, have driven the adoption of deep learning.

Deep learning harnesses powerful computational algorithms to address long-standing challenges, including small-target identification and environmental variability, satisfying the escalating need for precision and flexibility in O&G facility detection. He et al. [19] refined the Mask R-CNN framework by using D-LinkNet as a backbone and implementing a semantic segmentation branch to unravel road-well links in multi-sensor imagery, thereby fortifying the reliability of oil well identification. In a related vein, Zhang et al. [20] revamped YOLOv5 through the incorporation of instance segmentation, a context augmentation module, and normalized weighted distance, consummately facilitating oil well detection in occluded high-resolution remote sensing images and attaining significant hikes in

F_{1}

. Additionally, Guisiano et al. [21] proved the effectiveness of their approach by fine-tuning YOLOv8, Faster R-CNN, and DETR on high-resolution Permian Basin satellite imagery to effectively map O&G infrastructure, and they used pre-trained models to enhance detection across this infrastructure. Extending this focus to synthetic aperture radar (SAR), which excels in all-weather monitoring, Ma et al. [22] devised an end-to-end model based on a Transformer for 3D oil tank detection from single SAR images. This model incorporated incidence-angle priors and a feature-description operator to improve precision in the presence of dense scattering centers. In a parallel effort, Wu et al. [23] developed YOLOX-TR, fusing a Transformer encoder and reparameterized visual geometry group-like blocks into YOLOX, to tackle dense oil tank detection and classification in large-scale SAR images, effectively mitigating overlaps and geometric distortions.

2. Methods

2.1. Oriented R-CNN

Oriented R-CNN, a two-stage deep learning framework specifically designed for oriented object detection, is built upon the well-known Faster R-CNN framework [24]. In the first stage, a rotated Region Proposal Network (RPN) deploys a lightweight, fully convolutional network to generate high-quality oriented proposals at almost no cost. In the second stage, the oriented R-CNN head leverages rotated Region of Interest (RoI) alignment to extract features from each oriented proposal, followed by classification and regression to determine object categories and optimize bounding box coordinates. This methodology, which is supported by a ResNet backbone, ensures the effective detection of arbitrarily positioned objects across a wide range of scales and orientations.

In comparison with existing oriented object detection algorithms, Oriented R-CNN demonstrates superior performance in key aspects by addressing critical limitations. Single-stage detectors, such as Rotated RetinaNet, often struggle with degraded angle regression accuracy, constrained by their reliance on dense anchor-based predictions with rigid configurations [25]. Conversely, Oriented R-CNN’s two-stage architecture enables precise proposal refinement, effectively overcoming such constraints. Additionally, when benchmarked against Rotated Faster R-CNN variants, Oriented R-CNN substantially reduces misalignment errors stemming from axis-aligned feature pooling [26], thereby achieving enhanced precision in localization and orientation estimation for oriented objects. These distinctive strengths position it as an optimal solution for tasks demanding accurate orientation estimation.

The adoption of Oriented R-CNN as the baseline model for this study is strategically motivated by its exceptional compatibility with the unique challenges posed by our O&G facility dataset. Beyond the inherent orientation variability of the targets, the dataset introduces complexities through the sparse distribution of facilities and visually ambiguous backgrounds, such as sandy expanses interspersed with rocky formations. Oriented R-CNN effectively mitigates these issues by efficiently generating high-quality oriented proposals, which minimizes false positives in sparse regions, while its rotated RoI alignment mechanism ensures precise feature extraction despite background interference. Consequently, this framework excels in delivering robust detection performance, particularly tailored to the precise identification of the O&G facilities.

This study catalyzes a recalibration of Oriented R-CNN in the crucible of O&G facility detection, where accuracy and efficiency hinge on overcoming class imbalance, scale disparity, and rotational variance, as shown in Figure 2. This work transcends conventional limitations by interweaving a bespoke loss function (O&G Loss Function), a discerning hard example mining mechanism (CAHEM), and a feature pyramid architecture augmented with attention-driven refinement (FPNFEA) (Figure 3).

2.2. O&G Loss Function

The construction of efficacious loss functions constitutes a formidable challenge in the domain of oriented object detection, driven by the intricate complexities arising from sparse target distribution, disparate scale, and varied orientation, as shown in Figure 2. Conventional methodologies, exemplified by standard IoU-based metrics, often exhibit limitations in simultaneously accommodating these multifaceted issues, thereby yielding suboptimal performance in contexts such as O&G facility detection across diverse environmental settings. To address these inadequacies, the O&G Loss Function is proposed (Figure 4), meticulously engineered to augment detection efficacy by addressing the three above-mentioned problems. The O&G Loss Function (

L_{O & G}

) integrates a class-weighted GIoU loss (

L_{Class}

), a scale-aware regression loss (

L_{Scale}

), and an orientation-adjusted angular loss (

L_{Angle}

) formulated as Equation (1).

L_{O & G} = λ_{Class} \cdot L_{Class} + λ_{Scale} \cdot L_{Scale} + λ_{Angle} \cdot L_{Angle}

(1)

where

λ_{Class}

,

λ_{Scale}

, and

λ_{Angle}

are weighting coefficients tuned to balance the contributions of class distribution, scale variability, and orientation alignment, respectively.

2.2.1. Class-Weighted GIoU Loss Function

A striking long-tail distribution becomes apparent when examining the global deployment of O&G facilities. Rare entities like drillings are vastly outnumbered by prevalent targets such as well sites, engendering an intrinsic class imbalance that undermines detection performance [7,8]. Traditional IoU losses, by indiscriminately weighting all samples, overlook the imperative to elevate the priority of these underrepresented classes, thereby compromising recall for rare facilities. Moreover, their inability to furnish optimization gradients in cases where predicted and target boxes are non-overlapping. To circumvent these obstacles, GIoU is enlisted, leveraging a convex hull penalty to enable gradient-driven refinement even absent overlap [27]. This study introduces a class-weighted variant of the GIoU loss function, formally defined as follows:

L_{Class} = \sum_{i} w_{c_{i}} (1 - {GIoU}_{i}),

(2)

where

{G I o U}_{i}

is expressed as follows:

{GIoU}_{i} = {IoU}_{i} - P_{i},

(3)

where

{I o U}_{i}

and

P_{i}

are calculated as Equations (4) and (5).

{IoU}_{i} = \frac{| B_{p_{i}} \cap B_{t_{i}} |}{| B_{p_{i}} \cup B_{t_{i}} |},

(4)

P_{i} = \frac{| C_{i} ∖ (B_{p_{i}} \cup B_{t_{i}}) |}{| C_{i} |}

(5)

Here,

w_{c_{i}}

denotes the class-specific weighting factor.

{I o U}_{i}

measures the overlap between the predicted

B_{p_{i}}

and the ground truth

B_{t_{i}}

rotated bounding box.

P_{i}

represents the convex hull penalty that facilitates gradient computation in non-overlapping cases.

C_{i}

indicates the smallest convex region enclosing both boxes, and

| \cdot |

signifies the area.

2.2.2. Scale-Aware Regression Loss Function

The heterogeneous morphological profiles of O&G facilities give rise to scale disparities that compromise the efficacy of standard regression approaches [28]. Traditional loss functions based on absolute coordinate deviations are insufficient for addressing the diverse variations in scale, thereby limiting their capacity for optimized calibration. This constraint motivates the introduction of a scale-aware weighting factor, as shown in Equation (6), which is specifically designed to enhance regression stability through dynamic calibration in accordance with object area.

L_{Scale} = \sum_{i} w_{s_{i}} w_{c_{i}} \cdot {Reg}_{i},

(6)

where

w_{c_{i}}

is the class weight, and

w_{s_{i}}

is the scale-aware weight, computed as follows:

w_{s_{i}} = \sqrt{\frac{1}{A_{i} / \bar{A} + ϵ}},

(7)

{Reg}_{i} = \sum_{j = 1}^{n} | p_{i, j} - t_{i, j} |

(8)

where

A_{i} = w_{i} \times h_{i}

represents the ground truth bounding box area (with width

w_{i}

and height

h_{i}

),

\bar{A}

denotes the mean area across all ground truth bounding boxes, and

ϵ = 10^{- 6}

is a small constant to prevent division by zero. The regression term

{Reg}_{i}

is defined as the sum of absolute coordinate deviations, where n represents the number of parameters defining the bounding box, and

p_{i, j}

and

t_{i, j}

denote the predicted and ground truth values of the j-th parameter, respectively.

2.2.3. Orientation-Adjusted Angular Loss Function

O&G facilities, unlike objects with fixed orientations, adopt random angular configurations dictated by diverse terrains like deserts, posing a serious obstacle to orientation alignment precision in oriented object detection [29]. Standard loss functions, primarily intended for horizontal bounding boxes, struggle to capture the angular deviations of oriented bounding boxes, leading to suboptimal localization performance in complicated situations. To counter such a deficiency, this study proposes an orientation-adjusted weighting factor

w_{θ_{i}}

to adjust the regression penalty based on the angular difference between predicted and ground truth orientations. The orientation-adjusted regression loss is formulated as follows:

L_{Angle} = \sum_{i} w_{θ_{i}} w_{c_{i}} \cdot {Reg}_{i},

(9)

w_{θ_{i}} = 1 + λ \cdot |sin (θ_{p_{i}} - θ_{t_{i}})|

(10)

where

w_{c_{i}}

is the class weight, consistent with the scale-aware component, and

{R e g}_{i}

is the regression term, both of which are defined in Section 2.2.2.

θ_{p_{i}}

and

θ_{t_{i}}

represent the predicted and ground truth rotation angles of the i-th bounding box, respectively.

|sin (θ_{p_{i}} - θ_{t_{i}})|

measures the angular deviation.

λ

is a hyperparameter controlling the influence of orientation discrepancy.

2.3. CAHEM

CAHEM is introduced in this study to proficiently address the issues of categorical imbalance and orientational variability in oriented object detection for O&G facilities (Figure 4). This section elaborates on this novel methodology, which employs a class-differentiated weighting architecture and an intricate difficulty-scoring framework to rectify the imbalance intrinsic to O&G datasets [30]. Through this integration, CAHEM establishes a discerning strategy that prioritizes the enhancement of underrepresented classes while adeptly tackling orientation-divergent targets.

CAHEM quantifies the complexity of each positive sample through a constructed difficulty index, which synthesizes IoU, classification fidelity, and angular divergence. The index (

S_{difficulty}

) is formulated as follows:

S_{difficulty} = (1 - IoU) + λ_{cls} \cdot (1 - P_{cls}) + λ_{angle} \cdot sin (| Δ θ |),

(11)

where IoU gauges the congruence between the predicted and ground-truth rotated bounding boxes.

P_{cls}

represents the peak softmax probability for the assigned category.

Δ θ

captures the angular deviation.

λ_{cls}

and

λ_{angle}

are calibration parameters. The inclusion of

sin (| Δ θ |)

ensures acute sensitivity to rotational discrepancies, a pivotal element for achieving robust detection in O&G contexts.

The curation of arduous samples is organized by a weighted top-k selection regimen, which ranks samples based on their difficulty scores adjusted by class-specific weights and selects the top k samples, where k is a fraction of total positive samples. This process is reinforced by weights derived from dataset frequency distributions, assigning lower values to more frequent classes and higher values to less frequent classes to emphasize underrepresented categories. The refined difficulty score (

S_{weighted}

) is subsequently articulated as follows:

S_{weighted} = S_{difficulty} \cdot w_{class}

(12)

where

w_{class}

denotes a class-specific weight derived from dataset statistics.

CAHEM provides a structured approach to strengthen O&G detection by strategically amplifying the influence of challenging samples. Its class-aware weighting and selection mechanism not only bridge the gap between imbalanced classes but also refine the model’s capacity to navigate the multifaceted demands of O&G terrains.

2.4. FPNFEA

The rugged terrain and intricate layout of O&G facilities engender considerable obstacles for oriented object detection, as targets like well sites exhibit wide-ranging scales and arbitrary orientations. Such variability often overwhelms conventional detection models, which struggle to balance multi-scale feature extraction with orientation awareness. This section presents FPNFEA, a method crafted to enhance multi-scale feature representations while amplifying responsiveness to rotational subtleties.

The FPNFEA framework extends the standard FPN by incorporating the FEA module [31]. It retains the core FPN topology, generating a hierarchical feature pyramid from backbone outputs

{C_{i}}_{i = 2}^{5}

through lateral convolutions (kernel size 3 × 3 to standardize channels to 256) and top-down fusion to produce feature maps

{P_{i}}_{i = 2}^{6}

. To better accommodate O&G targets of varied scales, two targeted modifications are introduced. First, the lateral convolution kernel size is increased to 5 × 5, expanding the receptive field. Second, a lightweight FEA module is incorporated at each pyramid level, as shown in Figure 5.

Central to the FPNFEA architecture is the FEA module, which employs channel-wise attention to selectively accentuate pivotal features. For each feature map

P_{i} \in R^{C \times H \times W}

, global average pooling condenses spatial dimensions to

1 \times 1

, producing a channel-wise descriptor. This descriptor is processed through two convolutional layers with a channel reduction factor of 2, followed by a ReLU activation and a sigmoid function, to compute attention weights. The process can be formalized as follows:

A_{i} = σ ({Conv}_{2} (ReLU ({Conv}_{1} (GAP (P_{i}))))),

(13)

where

A_{i} \in R^{C \times 1 \times 1}

denotes the attention weights. GAP represents global average pooling.

{Conv}_{1}

and

{Conv}_{2}

are convolutional layers with a channel reduction factor of 2. ReLU is the activation function, and

σ

is the sigmoid function. The refined feature map is then computed using the attention weights as follows:

P_{i}^{'} = P_{i} ⊙ A_{i}

(14)

where

P_{i}^{'} \in R^{C \times H \times W}

is the enhanced feature map.

P_{i} \in R^{C \times H \times W}

is the original feature map.

A_{i} \in R^{C \times 1 \times 1}

represents the attention weights broadcast across spatial dimensions. ⊙ denotes element-wise multiplication. Drawing inspiration from seminal attention mechanisms [32], this approach focuses on features critical for discerning rotated objects within the cluttered O&G milieu. Through the synergistic integration of multi-scale feature extraction and attention-driven refinement, the FPNFEA framework adeptly localizes and classifies a spectrum of O&G targets.

3. Experimental Results

3.1. Dataset

This study designated the Tarim Basin in the Xinjiang Uygur Autonomous Region, China, as the research area, a region esteemed as one of China’s principal O&G resource reservoirs (Figure 6). Its arid desert terrain, marked by sparse vegetation and negligible cloud cover, creates ideal conditions for acquiring high-resolution satellite imagery. This region’s complex topography, featuring aeolian dunes, rocky outcrops, and diffusely distributed related facilities, demands exceptional detection precision, rendering the basin an exemplary venue for identification tailored to O&G infrastructure.

To support this research, we employed high-resolution satellite images from multiple sources, including the BeiJing-2, BeiJing-3, and GaoFen-2 satellites (Figure 7). The dataset comprises 3039 images, each containing 1 to 7 O&G facilities. The images in the dataset are 1024 × 1024 pixels with 0.8 m spatial resolution per pixel. Domain specialists annotated images methodically using the open-source program roLabelImg. For each labeled image, an XML file in the PASCAL VOC format was generated, containing the image size and bounding box coordinates (center, width, height, angle), adhering to the le90 angle convention. The dataset encompassed three categories: well sites (3006 instances), industrial and mining lands (692 instances), and drillings (244 instances).

The dataset was split into training, validation, and test sets in a 70:15:15 ratio using a stratified approach at the image level. To preserve the original class distribution as closely as possible, we first assigned each image to a group based on its dominant class. Within each group, images were then randomly divided using stratified shuffling, maintaining the overall proportion of dominant classes across the three splits. This strategy ensures that the severe class imbalance observed in the full dataset is consistently reflected in the training, validation, and test sets. Consequently, the training and validation sets together contained all 2553 well sites, 568 industrial and mining lands, and 204 drillings, while the test set included 453 well sites, 124 industrial and mining lands, and 40 drillings. This distribution preserves the challenging long-tail characteristics across all splits, providing a rigorous evaluation of model performance, particularly on minority classes.

3.2. Evaluation Metrics

Six metrics, namely precision, recall,

F_{1}

,

{AP}_{50}

, mAP, and number of parameters, were selected as key metrics in this study to evaluate the performance of the deep learning models. Precision measures the proportion of correctly predicted O&G facilities among all instances predicted as positive, while recall indicates the proportion of actual O&G facilities that are correctly detected by the model.

F_{1}

, as the harmonic mean of precision and recall, provides a balanced evaluation of the model’s ability to reduce both false positives and false negatives. AP is calculated as the area under the precision-recall curve.

{AP}_{50}

denotes the average precision when the IoU threshold is fixed at 0.50, offering a practical indicator of localization accuracy under moderate overlap requirements. mAP is the mean AP across all classes and serves as the primary performance criterion. The number of parameters (Params) quantifies model complexity in millions, providing insight into computational efficiency and deployment feasibility. These metrics are calculated by the following equations:

Precision = \frac{TP}{TP + FP}

(15)

Recall = \frac{TP}{TP + FN}

(16)

F_{1} = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(17)

AP = \int_{0}^{1} p (r) d r

(18)

The computation of the above metrics is calculated based on the confusion matrix, where true positives (TP) are correctly detected O&G facilities, false positives (FP) are background regions incorrectly classified as facilities, false negatives (FN) are missed O&G facilities, and true negatives (TN) are correctly identified background regions.

3.3. Implementation Details

All experiments were conducted on a system with an NVIDIA GeForce RTX 4080 SUPER GPU (16 GB) and CUDA 11.8. The experimental setup adopted an image patch size of 1024 × 1024 pixels and a batch size of 2. All models were trained for 50 epochs with a learning rate of 0.005.

Hyperparameters specific to the proposed modules were selected via grid search on the validation set to maximize mAP. For the O&G Loss Function, the weighting coefficients were searched in the range [0.1, 2.0] with step 0.1, and the final values were set to

λ_{Class} = 1.0

,

λ_{Scale} = 0.5

, and

λ_{Angle} = 0.8

. In CAHEM, the difficulty scoring coefficients

λ_{cls}

and

λ_{angle}

were tuned in [0.1, 2.0], resulting in

λ_{cls} = 0.5

and

λ_{angle} = 1.0

, while the top-k fraction was selected as 0.7 from [0.5, 0.9]. For FPNFEA, the channel reduction factor in the attention bottleneck was fixed at 2, following standard squeeze-and-excitation designs. These values yielded the highest validation mAP and were adopted for final evaluation.

3.4. Results

This section evaluates the performance of OGF Oriented R-CNN for detecting O&G facilities, employing precision, recall,

F_{1}

,

{AP}_{50}

, mAP, and parameters. as principal metrics The research elucidates the contributions of the O&G Loss Function, CAHEM, and FPNFEA, particularly in resolving the distinctive challenges posed by O&G facility detection through systematic comparisons with the state-of-the-art (SOTA) models and a comprehensive ablation study.

3.4.1. Comparative Study

OGF Oriented R-CNN is benchmarked against seven SOTA models: Faster R-CNN [33], Gliding Vertex [34], H2RBox-v2 [35], Oriented R-CNN, OrientedFormer [14], RoI Transformer [26], and S2A-Net [36]. These models span leading axis-aligned and rotation-aware designs, offering a stringent comparison amid extreme class imbalance and geometric complexity.

As detailed in Table 1, OGF Oriented R-CNN attains the highest mAP of 82.9%, outperforming the baseline model (Oriented R-CNN) by 10.5 pp and other SOTA models by up to 27.6 pp. establishing unequivocal superiority in overall detection performance It achieves the leading

{AP}_{50}

for well sites as well as industrial and mining lands, and records the highest precision and

F_{1}

in industrial and mining land and drilling. Faster R-CNN leads in well site precision and drilling

{AP}_{50}

, yet exhibits lower industrial and mining land precision and an overall mAP of 79.7%. OrientedFormer exhibits the highest recall across all classes, reflecting its tendency to generate a large number of proposals that capture nearly all instances. However, this comes at the expense of the lowest precision among all evaluated models, resulting in a significantly reduced overall mAP. H2RBox-v2 has the lowest number of parameters, indicating high computational efficiency, yet this lightweight design is accompanied by markedly lower precision across all classes, leading to an mAP of 69.8%. The remaining rotation-aware models, including Gliding Vertex, RoI Transformer, and S2A-Net, deliver mAP ranging from 55.3% to 76.1%, with consistently inferior

{AP}_{50}

in underrepresented classes. By contrast, OGF Oriented R-CNN sustains near-saturation recall in the dominant well site category while achieving substantial and consistent gains in precision,

F_{1}

, and

{AP}_{50}

across industrial and mining land and drilling.

3.4.2. Ablation Study

An ablation study is conducted to clarify the contributions of the proposed components: (1) O&G Loss Function, (2) CAHEM, (3) FPNFEA. The performance of the baseline model (Oriented R-CNN) is reported in Table 1. As evidenced in Table 2, the O&G Loss Function markedly enhances precision and

F_{1}

across all categories while maintaining stable recall. The subsequent addition of CAHEM brings further improvement, particularly in the drilling class. OGF Oriented R-CNN augmented with FPNFEA achieves the best mAP, with the most significant advancement observed in the industrial and mining land class. When added individually, the O&G Loss Function yields +4.1 pp, CAHEM +3.8 pp, and FPNFEA +0.6 pp, with the combination demonstrating synergistic benefits.

Although partial models exhibit isolated superior metrics, these are offset by lower overall mAP and suboptimal results in other categories, rendering them inferior to OGF Oriented R-CNN. For instance, Oriented R-CNN with CAHEM achieves a better well site

{AP}_{50}

, yet it falls short with an overall mAP. Similarly, Oriented R-CNN with CAHEM and FPNFEA demonstrates superior well-site recall, although it is limited by an mAP of 77.3% and diminished

F_{1}

in industrial and mining land and drilling. Furthermore, precision and

F_{1}

for the drilling class are slightly lower in OGF Oriented R-CNN compared to the version without FPNFEA, but these gaps are minimal and closely comparable. OGF Oriented R-CNN, however, delivers a much higher

{AP}_{50}

and an overall mAP gain of +2.8 pp, which confirms its superior localization accuracy and consistent improvement across different O&G facility types.

3.4.3. Qualitative Evaluation

Beyond quantitative metrics, a qualitative examination reveals how OGF Oriented R-CNN performs in practical scenarios. Here, we evaluate detection performance by comparing OGF Oriented R-CNN against different methods on representative O&G facility scenes and assessing its capability to generate precise localizations, handle rotational and scale variations, and mitigate redundancy, especially for minority classes.

Faster R-CNN generates multiple overlapping rotated bounding boxes for the same target, as illustrated in Figure 8b, culminating in over-detection of the minority class and, in some instances, complete failure to localize the drilling. These shortcomings directly explain its lower metrics on the minority class in Table 1. OGF Oriented R-CNN, on the other hand, detects the drilling clearly and represents each industrial and mining land with a single compact rotated bounding box (Figure 8c). Although a small degree of redundancy remains for well sites, the predictions are overall far more orderly and faithful to object structure.

Figure 9 traces the evolution of detection quality, from the baseline model to OGF Oriented R-CNN, across three hallmark O&G facility types. The baseline model detects most targets but generates numerous redundant and overlapping boxes (Figure 9b). With O&G Loss, redundancy drops sharply, yet localization remains imprecise in the well site and drilling examples, and the industrial and mining land in the second column vanishes entirely (Figure 9c). Adding CAHEM sharpens angular alignment and further culls duplicates, though a minor false positive appears in the industrial and mining land scene, indicating incomplete discrimination from background clutter (Figure 9d). OGF Oriented R-CNN largely resolves these issues, producing cleaner detection results free of substantial redundancy, omissions, or false positives (Figure 9e). Even so, slight angular inaccuracies are still observable in a small number of drilling instances under extreme rotation, which suggests that complete regression convergence remains a work in progress. Taken together, the improvements align with the gains in Table 2, improving robustness for O&G facility monitoring. dovetail neatly with the quantitative results in Table 2, affirming that OGF Oriented R-CNN represents a notable leap forward in managing class imbalance, scale variation, and rotational complexity.

The detection results in Figure 10 display how OGF Oriented R-CNN performs in scenarios representative of O&G facilities’ key challenges (class imbalance, scale disparity, and rotational variance), maintaining stable prediction quality across varying conditions. Two well sites appear with pronounced orientation differences and partial obstruction from the surrounding terrain in Figure 10d. OGF Oriented R-CNN captures each facility using rotated bounding boxes that closely match its true directional layout, indicating reliable handling of substantial angular variation. Figure 10e includes a single drilling instance, the least represented category in the dataset. Despite its scarcity and the low-contrast desert background, OGF Oriented R-CNN produces a clear, correctly oriented detection, showing its ability to recognize minority-class objects even in visually homogeneous environments. The larger well sites are outlined with well-shaped rotated bounding boxes, whereas the smaller industrial and mining lands are marked with equally coherent boundaries. Figure 10f serves as an example of the model’s performance when targets with substantial scale differences are present within a single scene.

4. Discussion

The analysis revealed that dataset features fundamentally governed detector performance in oriented object detection. Surprisingly, the axis-aligned Faster R-CNN achieved higher mAP than the most rotation-aware models, suggesting that rotational sophistication alone had been insufficient to overcome dataset bias when well-site instances dominate. Similarly, RoI Transformer and S2A-Net exhibited lower performance on minority classes, indicating that additional rotational flexibility could exacerbate false positives when class imbalance is not explicitly addressed.

The outstanding performance of OGF Oriented R-CNN against seven SOTA models provided clear evidence of its efficacy in this challenging domain. Certain baselines exhibited localized advantages in specific metrics. Faster R-CNN leveraged its robust axis-aligned localization to achieve high precision for the dominant well-site class. However, it encountered difficulties with industrial and mining lands, resulting in reduced accuracy for minority classes. OrientedFormer employed an aggressive proposal-generation strategy that captured nearly all instances, yielding the highest recall. This heightened sensitivity yielded the lowest precision among all models, underscoring the challenge of preserving specificity in severely imbalanced scenarios. H2RBox-v2 adopted a lightweight architecture that minimizes parameters and thus offers superior computational efficiency, but this compactness compromised detection accuracy across all classes. The other rotation-aware models, including Gliding Vertex, RoI Transformer, and S2A-Net, focused primarily on rotational modeling while inadequately mitigating class imbalance, yielding inconsistent results on underrepresented categories. In contrast, OGF Oriented R-CNN attained the highest overall mAP while delivering the most uniform and substantial advancements in precision,

F_{1}

, and

{AP}_{50}

across the underrepresented classes. This outcome demonstrated that effective detection in such scenarios required coordinated interventions across loss design, training strategy, and feature extraction, rather than isolated enhancements in rotational modelling or efficiency.

What set the proposed OGF Oriented R-CNN apart was its incorporation of the O&G Loss Function, CAHEM, and FPNFEA, which collectively served to overcome class imbalance, scale disparity, and rotational variance. Adding the O&G Loss Function accounted for the largest single-step mAP gain (+4.1 pp) by restoring gradient contribution from minority classes. Subsequent incorporation of CAHEM further enhanced performance (+3.6 pp relative to the model with the O&G Loss Function), with particularly notable gains in the drilling class, where small and arbitrarily oriented targets benefited from focused regression. Applying FPNFEA contributed an additional +2.8 pp mAP, primarily by strengthening the multi-scale context for industrial and mining land without compromising prior gains. Minor reductions in drilling precision and

F_{1}

in OGF Oriented R-CNN reflected the expected trade-off in multi-scale fusion, where richer features may introduce subtle boundary noise, yet the net mAP and

{AP}_{50}

gains indicated that generalization had improved overall. As these findings attested, targeted undertakings to rectify class imbalance, scale disparities, and rotational variances position OGF Oriented R-CNN as a pinpoint-accurate solution for O&G facility detection.

Although the evaluation was conducted exclusively on the Tarim Basin dataset, robustness considerations guided the inclusion of high-resolution imagery from multiple sensors (BeiJing-2, BeiJing-3, and GaoFen-2), introducing diversity in spectral response and acquisition conditions. This multi-source design enhances the model’s resilience to sensor-specific variations and provides a more representative testbed than single-sensor datasets. As additional publicly available oriented datasets for O&G facilities in other areas become accessible, future work will assess cross-region transferability and extend the framework to broader infrastructure categories.

5. Conclusions

Reliable identification of O&G facilities from high-resolution remote sensing imagery remains a fundamental yet unresolved challenge, fueled by the coexistence of severe inter-class imbalance, pronounced scale heterogeneity, and rotational ambiguity. When confronted with these difficulties, conventional detectors, whether axis-aligned or rotation-aware, frequently suffer from degraded precision.

This paper proposed OGF Oriented R-CNN, a novel detection model that systematically integrating three complementary modules: O&G Loss Function for adaptive class reweighting, scale-aware regression, and orientation-sensitive penalties; CAHEM for class-aware hard-example mining; and FPNFEA for attention-guided multi-scale fusion. Experimental results demonstrated that OGF Oriented R-CNN attained an mAP of 82.9%, representing a +10.5 pp improvement over the baseline model (Oriented R-CNN). The proposed model delivered substantial gains in precision and

F_{1}

for minority classes while maintaining high recall in the dominant class, outperforming seven SOTA models by up to 27.6 pp. Comparisons highlighted potential for future hybrids, such as combining lightweight efficiency from models like H2RBox-v2 with better minority-class handling. Insights from comparative models, such as the computational efficiency of lightweight designs in H2RBox-v2 or the recall strength of transformer-based approaches in OrientedFormer, may inspire hybrid methods combining these advantages with the challenges in O&G facilities. Its efficacy was further corroborated through qualitative assessments, showing reduced duplicate detections and enhanced angular accuracy in diverse operational scenarios. OGF Oriented R-CNN can offer a reliable foundation for operational deployment in energy infrastructure surveillance and environmental monitoring. Performance remains limited in cases of extreme rotation, indicating opportunities for future improvements in geometric representations or the use of auxiliary data. Despite OGF Oriented R-CNN’s strong performance, modest limitations endure under extreme rotational conditions, pointing to future research avenues such as the development of more robust geometric representations and the effective integration of auxiliary information.

Author Contributions

Conceptualization, Y.Q. and Y.C.; methodology, Y.Q. and Y.C.; software, Y.Q. and Z.C.; validation, S.L., N.Z. and Z.C.; formal analysis, Y.Q.; investigation, Y.Q. and M.L.; resources, S.L., N.Z. and Y.C.; data curation, Z.C. and M.L.; writing—original draft preparation, Y.Q.; writing—review and editing, Y.Q., Y.C., S.L., N.Z., Z.C. and M.L.; visualization, Y.Q.; supervision, Y.C.; project administration, Y.C.; funding acquisition, S.L., N.Z. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Graduate Innovation Program of China University of Mining and Technology (Grant No. 2025WLJCRCZL004).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to commercial privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

International Eneryg Agency. Available online: https://www.iea.org/reports/oil-2024/executive-summary (accessed on 14 March 2025).
Xiong, S.; Tan, Y.; Li, Y.; Wen, C.; Yan, P. Subtask attention based object detection in remote sensing images. Remote Sens. 2021, 13, 1925. [Google Scholar] [CrossRef]
Ramachandran, N.; Irvin, J.; Omara, M.; Gautam, R.; Meisenhelder, K.; Rostami, E.; Sheng, H.; Ng, A.Y.; Jackson, R.B. Deep learning for detecting and characterizing oil and gas well pads in satellite imagery. Nat. Commun. 2024, 15, 7036. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Xiao, P.; Yao, F.; Zhang, Q.; Gong, Y. Fusion of multi-scale attention for aerial images small-target detection model based on PARE-YOLO. Sci. Rep. 2025, 15, 4753. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Zhao, H.; Zhou, J. DKETFormer: Salient object detection in optical remote sensing images based on discriminative knowledge extraction and transfer. Neurocomputing 2025, 625, 129558. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
S&P Global. Available online: https://www.spglobal.com/commodity-insights/en/research-analytics/drilled-but-uncompleted-wells (accessed on 22 March 2025).
Global Energy Monitor. Available online: https://globalenergymonitor.org/projects/global-oil-gas-extraction-tracker/ (accessed on 22 March 2025).
Pei, J.; Wu, X.; Liu, X.; Gao, L.; Yu, S.; Zheng, X. SGD-YOLOv5: A Small Object Detection Model for Complex Industrial Environments. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
Li, S.; Zhu, Z.; Sun, H.; Ning, X.; Dai, G.; Hu, Y.; Yang, H.; Wang, Y. Towards high-accuracy and real-time two-stage small object detection on FPGA. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 8053–8066. [Google Scholar] [CrossRef]
Cao, F.; Wang, R.; Li, D.; Hu, Z. RS-DETR: An Improved DETR for High-Resolution Remote Sensing Image Object Detection. In Proceedings of the 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Kuching, Malaysia, 6–10 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1377–1382. [Google Scholar]
Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]
Zheng, S.; Wu, Z.; Xu, Y.; Wei, Z. Instance-aware spatial-frequency feature fusion detector for oriented object detection in remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5606513. [Google Scholar] [CrossRef]
Zhao, J.; Ding, Z.; Zhou, Y.; Zhu, H.; Du, W.L.; Yao, R.; El Saddik, A. OrientedFormer: An end-to-end transformer-based oriented object detector in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5640816. [Google Scholar] [CrossRef]
Li, Z.; Liu, W.; Xie, Z.; Kang, X.; Duan, P.; Li, S. FAA-Det: Feature Augmentation and Alignment for Anchor-Free Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5539411. [Google Scholar] [CrossRef]
Song, S.; Liu, J.; Liu, Y.; Feng, G.; Han, H.; Yao, Y.; Du, M. Intelligent object recognition of urban water bodies based on deep learning for multi-source and multi-temporal high spatial resolution remote sensing imagery. Sensors 2020, 20, 397. [Google Scholar] [CrossRef] [PubMed]
Fang, K.; Ouyang, J.; Hu, B. Swin-HSTPS: Research on target detection algorithms for multi-source high-resolution remote sensing images. Sensors 2021, 21, 8113. [Google Scholar] [CrossRef] [PubMed]
Aljameel, S.S.; Alomari, D.M.; Alismail, S.; Khawaher, F.; Alkhudhair, A.A.; Aljubran, F.; Alzannan, R.M. An anomaly detection model for oil and gas pipelines using machine learning. Computation 2022, 10, 138. [Google Scholar] [CrossRef]
He, H.; Xu, H.; Zhang, Y.; Gao, K.; Li, H.; Ma, L.; Li, J. Mask R-CNN based automated identification and extraction of oil well sites. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102875. [Google Scholar] [CrossRef]
Zhang, Y.; Bai, L.; Wang, Z.; Fan, M.; Jurek-Loughrey, A.; Zhang, Y.; Zhang, Y.; Zhao, M.; Chen, L. Oil well detection under occlusion in remote sensing images using the improved YOLOv5 model. Remote Sens. 2023, 15, 5788. [Google Scholar] [CrossRef]
Guisiano, J.E.; Moulines, É.; Lauvaux, T.; Sublime, J. Oil and gas automatic infrastructure mapping: Leveraging high-resolution satellite imagery through fine-tuning of object detection models. In Neural Information Processing, Proceedings of the International Conference on Neural Information Processing, Changsha, China, 20–23 November 2023; Springer: Singapore, 2023; pp. 442–458. [Google Scholar]
Ma, C.; Zhang, Y.; Guo, J.; Hu, Y.; Geng, X.; Li, F.; Lei, B.; Ding, C. End-to-end method with transformer for 3-D detection of oil tank from single SAR image. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–19. [Google Scholar] [CrossRef]
Wu, Q.; Zhang, B.; Xu, C.; Zhang, H.; Wang, C. Dense oil tank detection and classification via YOLOX-TR network in large-scale SAR images. Remote Sens. 2022, 14, 3246. [Google Scholar] [CrossRef]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 8232–8241. [Google Scholar]
Shrivastava, A.; Gupta, A.; Girshick, R. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
Yu, Y.; Yang, X.; Li, Q.; Zhou, Y.; Da, F.; Yan, J. H2RBox-v2: Incorporating symmetry for boosting horizontal box supervised oriented object detection. Adv. Neural Inf. Process. Syst. 2023, 36, 59137–59150. [Google Scholar]
Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]

Figure 1. Field photographs of representative O&G facilities. (a) Densely distributed well sites, (b) close-up of drillings.

Figure 2. Statistical characteristics of the O&G facility dataset. (a) Class distribution (well sites: 3006, industrial and mining lands: 692, drillings: 244) reveals extreme imbalance. (b) Object scale distribution evinces substantial heterogeneity. (c) Orientation distribution exhibits extensive rotational variances.

Figure 3. Diagram of OGF Oriented R-CNN. (H, W) are the height and width of the input images. (x,y) are the center coordinates of the predicted proposal. (h,w) are the height and width of the external rectangular box of the predicted oriented proposal.

θ

is the rotation angle of the predicted proposal, determined according to the le90 annotation convention. The plus sign denotes element-wise addition for fusing the enhanced multi-scale features from FPNFEA with the backbone features.

Figure 3. Diagram of OGF Oriented R-CNN. (H, W) are the height and width of the input images. (x,y) are the center coordinates of the predicted proposal. (h,w) are the height and width of the external rectangular box of the predicted oriented proposal.

θ

is the rotation angle of the predicted proposal, determined according to the le90 annotation convention. The plus sign denotes element-wise addition for fusing the enhanced multi-scale features from FPNFEA with the backbone features.

Figure 4. Diagram of CAHEM and O&G Loss Function Framework.

S_{difficulty}

is the initial difficulty score, and

S_{weighted}

is the class-weighted difficulty score using

w_{class}

, the shared class weight derived from dataset frequency. The O&G Loss module integrates

L_{Class}

,

L_{Scale}

, and

L_{Angle}

, with

w_{class}

also influencing these losses. Dynamic feedback optimizes

w_{class}

based on loss gradients.

Figure 4. Diagram of CAHEM and O&G Loss Function Framework.

S_{difficulty}

is the initial difficulty score, and

S_{weighted}

is the class-weighted difficulty score using

w_{class}

, the shared class weight derived from dataset frequency. The O&G Loss module integrates

L_{Class}

,

L_{Scale}

, and

L_{Angle}

, with

w_{class}

also influencing these losses. Dynamic feedback optimizes

w_{class}

based on loss gradients.

Figure 5. Architecture of FPNFEA. Backbone features C2–C5 undergo 2× upsampling and element-wise addition to produce P2–P6. The FEA module first generates channel attention weights for each Pi by processing it through parallel branches (Conv 1 × 1, 3 × 3, 5 × 5, and Max Pooling 2 × 2), followed by AvgPool 1 × 1, ReLU, Sigmoid, and Conv 1 × 1 (Reduce and Restore). The Enhanced Pi is then obtained by element-wise multiplication of Pi with these weights.

Figure 6. Location of the study area.

Figure 7. Orbital information of the BeiJing-2, BeiJing-3, and GaoFen-2 satellites. SSO represents Sun-Synchronous Orbit.

Figure 8. Comparison of detection results between the two models. (a) Real images. (b) Predictions of Faster R-CNN. (c) Predictions of OGF Oriented R-CNN. Bounding boxes are colored red for well sites, yellow for industrial and mining lands, and green for drillings.

Figure 9. Detection results with stepwise improvement for O&G facilities. (a) Real images. (b) Predictions of the baseline model (Oriented R-CNN). (c) Predictions of the baseline model with O&G Loss added. (d) Predictions of the baseline model with O&G Loss and CAHEM added. (e) Predictions of OGF Oriented R-CNN. From left to right, the three columns correspond to well sites, industrial and mining lands, and drillings, respectively. Bounding boxes are colored red for well sites, yellow for industrial and mining lands, and green for drillings.

Figure 10. Detection results of the proposed OGF Oriented R-CNN on challenging scenes. (a–c) Real images. (d–f) Predictions of OGF Oriented R-CNN. Bounding boxes are colored red for well sites, yellow for industrial and mining lands, and green for drillings.

Table 1. Evaluation metrics for different models.

Model	Class	Precision (%)	Recall (%)	F₁ (%)	AP₅₀ (%)	mAP (%)	Params (M)
Faster R-CNN	well_site	62.8	98.2	76.6	90.3	79.7	41.13
	IM_land	37.4	79.8	50.9	61.1
	drilling	56.1	92.5	69.8	87.8
Gliding Vertex	well_site	53.1	97.4	68.7	86.1	76.1	41.13
	IM_land	40.5	74.2	52.4	56.9
	drilling	57.8	92.5	71.2	85.2
H2RBox-v2	well_site	38.0	87.2	52.9	73.3	69.8	31.90
	IM_land	2.9	78.2	5.6	50.6
	drilling	11.7	90.0	20.7	85.5
Oriented R-CNN	well_site	41.3	98.5	58.2	89.3	72.4	41.13
	IM_land	22.6	79.0	35.1	54.8
	drilling	29.7	87.5	44.3	73.1
OrientedFormer	well_site	1.3	99.8	2.6	88.7	65.5	44.52
	IM_land	0.1	96.8	0.2	47.1
	drilling	0.9	97.5	1.8	60.7
RoI Transformer	well_site	24.8	98.9	39.7	78.2	55.3	55.04
	IM_land	18.3	79.0	29.7	35.8
	drilling	22.1	90.0	35.5	51.9
S2A-Net	well_site	45.7	96.7	62.0	79.9	62.3	38.54
	IM_land	41.1	48.4	44.4	31.9
	drilling	38.0	95.0	54.3	75.1
OGF Oriented R-CNN	well_site	57.5	98.5	72.6	90.4	82.9	49.52
	IM_land	45.6	83.9	59.1	72.4
	drilling	63.2	92.3	75.0	85.8

Boldface highlights the optimal results for each metric among the evaluated models (highest for accuracy metrics, lowest for Params).

Table 2. Evaluation metrics (%) for models with different modifications.

Model	Class	Precision	Recall	$F_{1}$	${AP}_{50}$	mAP
+FPNFEA	well_site	41.8	98.5	58.7	89.2	73.0
	IM_land	23.0	78.2	35.6	56.2
	drilling	29.4	87.5	44.0	73.5
+CAHEM	well_site	55.1	98.2	70.6	90.7	76.2
	IM_land	37.2	75.8	49.9	62.2
	drilling	44.0	92.5	59.7	75.8
+O&G Loss Function	well_site	55.5	98.2	70.9	90.3	76.5
	IM_land	38.9	77.4	51.8	61.2
	drilling	45.0	90.0	60.0	78.1
+CAHEM +FPNFEA	well_site	53.1	98.7	69.0	90.5	77.3
	IM_land	38.9	77.4	51.8	62.4
	drilling	45.1	92.5	60.7	78.9
+O&G Loss Function +FPNFEA	well_site	54.7	98.2	70.2	90.5	77.5
	IM_land	38.7	75.8	51.2	64.3
	drilling	43.5	92.5	59.2	77.7
+O&G Loss Function +CAHEM	well_site	57.6	97.8	72.5	89.8	80.1
	IM_land	40.6	73.4	52.3	64.9
	drilling	64.4	95.0	76.8	85.7
+O&G Loss Function +CAHEM +FPNFEA (OGF Oriented R-CNN)	well_site	57.5	98.5	72.6	90.4	82.9
	IM_land	45.6	83.9	59.1	72.4
	drilling	63.2	92.3	75.0	85.8

Boldface highlights the optimal results for each metric among the evaluated models. The symbol “+” indicates the addition of the corresponding module to the baseline model (Oriented R-CNN).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qian, Y.; Liu, S.; Zhang, N.; Chen, Y.; Chen, Z.; Li, M. Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN. Remote Sens. 2026, 18, 229. https://doi.org/10.3390/rs18020229

AMA Style

Qian Y, Liu S, Zhang N, Chen Y, Chen Z, Li M. Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN. Remote Sensing. 2026; 18(2):229. https://doi.org/10.3390/rs18020229

Chicago/Turabian Style

Qian, Yuwen, Song Liu, Nannan Zhang, Yuhua Chen, Zhanpeng Chen, and Mu Li. 2026. "Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN" Remote Sensing 18, no. 2: 229. https://doi.org/10.3390/rs18020229

APA Style

Qian, Y., Liu, S., Zhang, N., Chen, Y., Chen, Z., & Li, M. (2026). Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN. Remote Sensing, 18(2), 229. https://doi.org/10.3390/rs18020229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Oil and Gas Facility Detection in High-Resolution Remote Sensing Images Based on Oriented R-CNN

Highlights

Abstract

1. Introduction

1.1. Background

1.2. Related Work

1.2.1. Object Detection Algorithms Based on Deep Learning

1.2.2. Multi-Source High-Resolution Satellite Imagery for Object Detection

1.2.3. Oil and Gas Facility Detection Using Remote Sensing

2. Methods

2.1. Oriented R-CNN

2.2. O&G Loss Function

2.2.1. Class-Weighted GIoU Loss Function

2.2.2. Scale-Aware Regression Loss Function

2.2.3. Orientation-Adjusted Angular Loss Function

2.3. CAHEM

2.4. FPNFEA

3. Experimental Results

3.1. Dataset

3.2. Evaluation Metrics

3.3. Implementation Details

3.4. Results

3.4.1. Comparative Study

3.4.2. Ablation Study

3.4.3. Qualitative Evaluation

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI