SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping

Zhang, Bai; Xu, Zongyu; Liu, Yunhe; Ai, Wenhao; Fan, Liming; An, Yuan; Yu, Shuhai

doi:10.3390/rs18081212

Open AccessArticle

SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping

by

Bai Zhang

,

Zongyu Xu

,

Yunhe Liu

,

Wenhao Ai

,

Liming Fan

,

Yuan An

and

Shuhai Yu

^*

Chang Guang Satellite Technology Co., Ltd., Changchun 130102, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(8), 1212; https://doi.org/10.3390/rs18081212

Submission received: 9 February 2026 / Revised: 12 April 2026 / Accepted: 13 April 2026 / Published: 17 April 2026

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A novel swath-based optimal remote sensing image selection model named SwathSel is proposed. This model can select a set of images within the area of interest that offers low redundancy and high visual quality while covering the entire area.
We construct connected subsets with the same swath and similar cloud cover, using them as the fundamental units for image selection.

What are the implication of the main findings?

The proposed model utilizes swath information through a composite grouping strategy and a dynamic adjustment mechanism, overcoming the limitations of scene-by-scene selection while maintaining flexibility in selection size.
Local and global swath consistency constraints are designed based on the topology and metadata of connected subsets, effectively improving visual consistency among selected images.

Abstract

With advancements in Earth observation capabilities, the demand for large-scale mapping using remote sensing images has increased significantly. However, selecting an optimal image set for the area of interest (AOI) from a large collection of remote sensing images remains challenging. On the one hand, it is crucial to select images with minimal redundancy and low cloud cover to enhance production efficiency and the effective coverage of mapping products. On the other hand, adjacent selected images should transition naturally so that the resulting mapping products appear visually cohesive. Unfortunately, most existing remote sensing image selection algorithms focus only on the former, with little attention to visual consistency. Meanwhile, images from the same swath inherently offer advantages in both redundancy reduction and visual consistency. However, a larger coverage area also carries the potential for greater variation in cloud cover, and cloud distribution within a swath can be highly complex. Managing the relationships among swaths, images, and cloud cover is also challenging. To address these issues, this paper proposes a novel image selection model, SwathSel. Candidate images are grouped through a composite grouping strategy based on swaths, cloud cover, and topological connectivity, thereby expanding the fundamental unit for image selection from individual scenes to connected image subsets. A dynamic adjustment mechanism is introduced to enhance grouping flexibility. Additionally, local and global swath consistency constraints are designed to strengthen visual consistency among images, and a subset evaluation module is used to comprehensively assess swath consistency, coverage, cloud cover, and metadata information. Through a greedy strategy combined with a rapid refinement technique, the final selected image set is obtained. Experiments were conducted on four datasets, and four quantitative metrics were designed to evaluate the visual consistency of the results. Compared with baseline models, SwathSel achieves lower redundancy and cloud cover while delivering superior visual consistency.

Keywords:

optimal selection of remote sensing images; visual consistency; composite grouping strategy; dynamic adjustment mechanism; swath consistency constraints

1. Introduction

In recent years, with the sustained growth of the global commercial satellite industry [1,2,3,4], an increasing number of satellites have been launched into space, leading to rapid advancements in the development of various remote sensing satellite networks. For instance, the Jilin-1 satellite project successfully launched a total of 109 satellites between 2022 and 2024. The proliferation of satellite resources has substantially enhanced Earth observation capabilities, generating vast quantities of remote sensing images. These images, characterized by high spatial resolution, extensive coverage, and strong revisit capabilities, have been widely applied across multiple domains, including object detection [5], building extraction [6,7], agricultural monitoring [8], change detection [9], and disaster management [10]. As remote sensing imagery serves various fields, the rapidly expanding scale of data has also imposed higher requirements on optimal image selection.

The core idea of optimal remote sensing image selection is to identify the image set with the lowest redundancy and best visual quality that covers the area of interest (AOI) from candidate images. Although sufficient candidate images can cover multiple layers of the AOI, each specific application typically operates on a single layer of images [11,12,13]. Thus, selecting the optimal images has become increasingly important in the big data era.

Traditional methods for optimal remote sensing image selection [14,15,16] retrieve remote sensing image datasets using spatial databases. By setting external criteria such as satellite source, acquisition time, and cloud cover, these methods derive a candidate image set that meets the requirements within the AOI. However, with the increase in satellite resources, the candidate image set typically exhibits high redundancy. To reduce excessive overlap between images, traditional methods require manual selection, sequentially determining whether each image should be added to the optimal set. While this manual approach can produce images with excellent visual quality, it entails significant labor and time costs. This limitation becomes more pronounced as the AOI coverage increases. Furthermore, relying on subjective manual visual selection yields inconsistent results for the same AOI due to varying judgment criteria among different operators.

Recently, some researchers have proposed new solutions to the issue of high redundancy by drawing on theories of the set covering problem from mathematics [17,18,19]. Chu et al. [20] developed a scoring model based on image metadata (e.g., acquisition time and coverage area), selecting the top k highest-scoring images in each iteration as preliminary results and applying a genetic algorithm to refine the selection once the AOI is fully covered. Yan et al. [21] considered image coverage and constructed sets of fragments from overlapping regions to obtain the optimal set with the fewest images. Tao et al. [22] rasterized the AOI into regular grids, transforming the image optimization problem into a grid voting problem solved by a Markov random field model. Li et al. [23] first obtained an initial result using a greedy algorithm based on each image’s coverage area and then continuously updated it through a weighted gain–loss scheme and dropout mechanism to derive the optimal image set. Other related studies have developed task-specific algorithms by adapting and optimizing optimal selection methods for their respective objectives. Liu et al. [24] focused on remote sensing image acquisition for disaster emergency response. They transformed the acquisition planning optimization problem into a weighted set coverage problem and solved it using a branch-and-bound algorithm [17]. Kempeneers et al. [25] employed image quicklooks as the basis for image selection, selecting the no cloud-contaminated images with minimal redundancy within the AOI. Pan et al. [26] employed an improved Swin-Transformer [27] to grade image quality by blocks, and further introduced spatiotemporal constraints to optimize the selection of images for generating a full coverage image (i.e., a mosaic image with no cloud-contaminated pixels).

Although the aforementioned methods account for image redundancy and cloud cover, they do not explore the natural transitions between images in depth. Natural transitions between images, i.e., visual consistency, clearly affect the quality of large-scale remote sensing image mapping results. In addition, the redundancy cannot approach zero indefinitely; a certain degree of overlap between images is required to ensure that the orthorectified images can still fully cover the AOI. During remote sensing satellite imaging, swath data is divided into individual standard scenes using cataloging. Since these images all originate from the same acquisition campaign, it is reasonable to conclude that they exhibit optimal redundancy and visual consistency. With the advancement of wide-swath remote sensing satellite technology, a single imaging swath can cover a larger geographic area. However, as the swath width increases, the impact of cloud cover can no longer be neglected. Even within a single swath, cloud cover may vary substantially across locations. Thus, a simplistic strategy of uniformly selecting or discarding entire swaths is inappropriate. Considering the grouping of images within the same swath, the number of groups reaches its maximum when each image forms an independent group. In this case, swath selection degenerates into the trivial scenario of selecting images one by one. Consequently, ensuring visual consistency between selected images and balancing relationships across swaths, images, and cloud cover are critical challenges for an optimal remote sensing image selection algorithm.

To address these issues, we group all candidate images into three levels based on swaths, cloud cover, and topological connectivity. The optimal processing unit for remote sensing images is expanded from single images to connected subsets with visual consistency. The image redundancy within each connected subset is extremely low, the imaging conditions are almost identical, and the cloud cover is also similar. Next, we propose a dynamic adjustment mechanism to facilitate the flow of single remote sensing images across different cloud cover intervals. To enhance the visual consistency of selected images, both local and global swath consistency constraints are introduced. This algorithm comprehensively incorporates swaths, image coverage, cloud cover, and metadata information, rapidly refining the preliminary selection using the boundary information of each connected subset to obtain the final results.

We also built four datasets using remote sensing images from the Jilin-1 satellite project to evaluate the performance of different algorithms. These datasets cover a variety of scenarios involving different latitudes, climatic conditions, data densities, and coverage extents. Experimental results show that our swath-based optimal remote sensing image selection algorithm significantly enhances the visual consistency of selected images and achieves state-of-the-art performance across multiple metrics. The main contributions of this study are as follows:

To the best of our knowledge, we are the first to incorporate swath information into the optimal selection of remote sensing images. We propose a framework named SwathSel, which balances swath, coverage, cloud cover, and metadata information.
We propose a composite grouping strategy and a dynamic adjustment mechanism that extend the processing unit of the optimal selection algorithm from single-scene images to connected subsets. These connected subsets partition swath data into smaller units, enabling flexible subset-size selection and improving the efficiency of swath information utilization.
To ensure visual consistency of the selected images, we apply local and global swath consistency constraints based on the topological structure of connected subsets and metadata information, respectively. We also construct four metrics to quantitatively evaluate the visual consistency performance of different methods.
We conducted experiments on four different datasets, and our SwathSel model achieved state-of-the-art results in terms of redundancy, cloud cover, and visual consistency compared to the baseline models.

2. Materials

2.1. Experimental Satellite Information

All images used in this study were acquired by satellites from three series of the Jilin-1 project: JL1KF01s, JL1KF02Bs, and JL1GF03Ds. Among them, JL1KF01B, JL1KF01C, and JL1KF02Bs provide panchromatic spatial resolution better than 0.5 m, whereas JL1KF01A and JL1GF03Ds provide panchromatic spatial resolution better than 0.75 m. The satellite parameters used in this study are listed in Table 1. The NQ05 and SA21 datasets cover all satellites from the three series mentioned above, whereas the NG48 and NKL4 datasets cover only satellites in the JL1KF01B, JL1KF01C, and JL1KF02B series.

2.2. Data Sources

Following the specifications of the international map of the world standard [28], this study divides the global space into a regular grid at a scale of 1:1,000,000. To objectively evaluate the performance of the proposed models on real-world remote sensing data, this study constructed four datasets using seven regular grids: NQ05, SA21, NG48, NK51, NK52, NL51, and NL52.

These four datasets are named NQ05, SA21, NG48, and NKL4, respectively. They cover different latitudes, climatic conditions, data densities, and coverage extents for experimentation. The spatial distributions of the datasets used in the experiments are illustrated in Figure 1. Specifically, the NQ05 dataset is located in Alaska, United States, where latitudes range from 64°N to 68°N. This region typically requires a larger roll angle for satellite imaging, resulting in more pronounced geometric distortion. The SA21 and NG48 datasets are located in the Southern and Northern Hemispheres, respectively. Influenced by the regional climate, both regions receive abundant precipitation and are subject to persistent cloud cover, which significantly shortens the acquisition window for remote sensing images. In addition, this study selected the land extents of four 1:1,000,000 sheets (NK51, NK52, NL51, and NL52) surrounding the headquarters of Chang Guang Satellite Technology Co., Ltd., Changchun, China. the developer of the Jilin-1 satellites, to construct the NKL4 dataset. This can further evaluate the performance of different models under scenarios involving large-scale coverage and high image redundancy.

Table 2 presents the image retrieval settings for each dataset. Specifically, the NQ05 and SA21 datasets use all the satellites listed in Table 1, while the NG48 and NKL4 datasets use only the ultra-high resolution satellites (UHRS), which are JL1KF01B, JL1KF01C, JL1KF02B01, JL1KF02B02, JL1KF02B03, JL1KF02B04, JL1KF02B05, and JL1KF02B06. Except for the cloud cover condition in the NQ05 dataset, which ranges from 0% to 100%, all other retrieval conditions are the same.

The cloud cover distribution for different datasets is shown in Figure 2a. Since the maximum cloud cover varies across datasets, the horizontal axis represents the normalized cloud cover. Prior to image acquisition, local weather conditions are assessed to ensure optimal data quality. As a result, the probability density peaks for all datasets are within the range of 0 to 0.1. The NKL4 dataset, collected in Northeast Asia, exhibits the highest and most concentrated probability density peak due to favorable imaging conditions. Conversely, the SA21 dataset, obtained from the Amazonian plains in the Southern Hemisphere, experiences rapidly changing weather and short imaging windows, resulting in lower peak values and a flatter distribution. The acquisition time distribution for different datasets is shown in Figure 2b. The acquisition time distributions across datasets vary significantly, reflecting different imaging windows across regions. Without loss of generality, this study adopts 17 May 2025, which corresponds to the midpoint of the dataset acquisition period, as the user-preferred date.

3. Methods

The SwathSel model primarily consists of three modules: (1) the composite grouping module, which applies a composite grouping strategy to partition input images based on swaths, cloud cover, and topological connectivity. It then flexibly adjusts the partitioned groups through a dynamic adjustment mechanism; (2) the subset evaluation module, which assesses and scores the swath consistency, coverage, cloud cover and metadata information of various connected subsets; and (3) the image selection module, which first uses a greedy approach to obtain preliminary results and then applies a rapid refinement technique to efficiently remove redundant images in the set to obtain the optimal remote sensing image set.

The SwathSel framework is illustrated in Figure 3. The algorithm takes two inputs: the universal set of candidate images I and the area of interest (AOI). The candidate images first enter the composite grouping module. Based on swaths, cloud cover, and topological connectivity, the algorithm sequentially performs a three-level subset partitioning to obtain the fundamental unit for optimal selection, i.e., the connected subset

I_{E}

. Each connected subset comprises one or more remote sensing images. All images within a subset share the same swath, have similar cloud cover, and are interconnected, thereby exhibiting natural visual consistency. However, artificially defined cloud cover intervals can hinder this visual consistency. To address this, the SwathSel algorithm employs a dynamic adjustment mechanism to relax cloud cover constraints in the inner ring region of the connected subset, thus filling the “gap”. The connected subsets before and after adjustment are sent together to the subset evaluation module. The subset evaluation module comprehensively scores each input subset based on swath consistency, area coverage, cloud cover, and metadata information. The highest-scoring connected subset is selected into the optimal set O in the subsequent image selection module. Notably, to avoid redundant image selection, each time a new connected subset is added to the optimal set O, the extent of the uncovered AOI and the images within the subset must be updated to ensure that all images in the subset intersect with the uncovered AOI. However, owing to discrepancies in image width and the azimuth between the swath extension direction and true north, redundant scenes may arise when newly selected images are merged with the existing image set in the optimal collection O. To address this issue, the SwathSel algorithm uses a rapid refinement technique to update the optimal set of remote sensing images.

3.1. Composite Grouping Module

The composite grouping module first groups the input candidate images based on swaths, cloud cover, and topological connectivity. The algorithm extends its fundamental selection unit from a single image to an image group, termed a connected subset and denoted as

I_{E}

. To overcome the limitations of grouping by cloud cover, we use a dynamic adjustment mechanism to obtain the adjusted connected subset

I_{E}^{'}

. The original

I_{E}

and the adjusted

I_{E}^{'}

have the same status and are jointly input to the subset evaluation module for scoring.

3.1.1. Swath Data of Optical Remote Sensing Satellites

Before introducing the composite grouping module, we briefly outline and analyze the swath data of optical remote sensing satellites. As satellites fly along predetermined orbits during imaging, the continuous images captured by their sensors form swath data [29,30]. After standardized segmentation and cropping, the swath data can be processed into generic single-scene remote sensing image products. Therefore, remote sensing images from the same swath share extremely similar acquisition times and atmospheric conditions. They exhibit highly consistent geometric [31] and radiometric [32] properties, with low overlap and strong visual consistency.

These inherent advantages of same-swath images make them well suited to the optimal image selection objective, yet this potential has not been fully explored in existing research. On the one hand, many high-resolution remote sensing satellites historically have narrow swath widths [33]. Compared with single-scene-based optimization methods, the advantages of swath-based selection are less pronounced in such cases, and the approach has not attracted widespread attention. On the other hand, as the geographic coverage of each swath increases, atmospheric conditions (e.g., clouds and water vapor) [34] can vary substantially across locations within the same swath, potentially resulting in different image appearances [35]. Therefore, selecting and discarding different scenes within a swath becomes challenging. In addition, we typically impose a limit on the cloud cover of a single scene in the candidate images. As a result, the topological relationships among images available for selection within a swath can become highly complex, thereby increasing the difficulty of swath-based selection.

Due to the uneven distribution of image quality in swath data, we use a composite grouping strategy and a dynamic adjustment mechanism to construct the composite grouping module. This effectively breaks down swath data into smaller parts, preserving the advantages of images with the same swath while also ensuring flexibility in image selection.

3.1.2. Composite Grouping Strategy

For large-scale mapping with high-resolution remote sensing images, a single satellite imaging pass cannot achieve full coverage of the AOI. Instead, the coverage typically requires multiple satellites and imaging plans. However, images from different satellites differ due to factors such as sensor characteristics, image width, and spatial resolution. Remote sensing images captured by the same satellite across different time series may also exhibit significant variations due to differences in atmospheric conditions and surface features. Therefore, we first group the images by their swaths. Each swath group includes all images of the same swath within the universal set I and is denoted as

G_{S_{i}}

. Since the images in

G_{S_{i}}

are from a single imaging pass of the same satellite, they exhibit strong visual consistency.

A further key consideration is cloud interference, as dense cloud cover obscures ground features and invalidates its use for applications such as object detection [36], semantic segmentation [37], and change detection [38]. Meanwhile, images with high cloud cover and excessive overlap also cause considerable challenges for subsequent remote sensing image processing workflows, including aerial triangulation [39] and color balancing [40]. Existing algorithms [20,22,23] typically consider the cloud cover of a single image, unilaterally minimizing it while ignoring the impact of different time series and satellite sources on the overall visual consistency of the dataset. Although we have grouped images by swaths, simply calculating the overall cloud coverage ratio for each swath still fails to yield an optimal solution in terms of cloud cover. Figure 4a shows quicklooks of a swath image from the universal set I. The blank areas represent images whose cloud cover exceeds the threshold. As can be seen, the lower half of the area is covered by extensive clouds, while the upper-left area is almost cloud-free. Clearly, if only the overall cloud cover of the swath is calculated, the cloud-free regions within it will either be discarded or selected in the optimal set together with high-cloud-cover regions. Neither of these outcomes is desirable.

To address this issue, we divide the swath groups into distinct cloud cover intervals. As shown in Figure 2a, there is a clear peak within the normalized cloud cover range of 0 to 0.1, while the distribution from 0.1 to 1 stays relatively flat. This suggests that we can set a narrow low cloud cover interval to acquire a large number of images, and then gradually relax the cloud cover constraints until the AOI is completely covered. Furthermore, the lower cloud cover interval shows higher relative sensitivity, meaning that the same change in cloud cover will lead to greater visual differences. Based on empirical experience, we divide each swath group into four cloud cover intervals with increasing widths, using a division ratio of 1:2:3:4. Thus, the cloud cover interval 1 (CCI1) covers the normalized cloud cover range of 0 to 0.1, which matches the distribution peak. The four cloud cover interval groups

G_{S_{i} C_{j}}

from a swath group

G_{S_{i}}

in the NG48 dataset are illustrated in Figure 4b. The upper limit of cloud cover in the NG48 dataset is 50%, and the four cloud cover intervals are

[0 %, 5 %]

,

(5 %, 15 %]

,

(15 %, 30 %]

, and

(30 %, 50 %]

, respectively. The first interval is left-closed and right-closed, while the others are left-open and right-closed.

However, the images within each cloud cover interval group may constitute multiple disconnected subregions. As shown in Figure 4c,d, treating these disconnected subregions as a single entity produces a highly disorganized result with poor visual consistency. Therefore, after grouping each swath by cloud cover intervals, we further group the images based on topological connectivity. Specifically, each connected region is evaluated and selected as an independent unit. Thus, each cloud cover interval may contain one or more connected component groups. The connected component group

G_{S_{i} C_{j} K_{l}}

is the fundamental unit for image selection in this model. For ease of discussion, we refer to these groups as connected subsets, denoted as

I_{E}

. Connected subsets achieve a fine depiction of cloud cover while maintaining visual consistency.

3.1.3. Dynamic Adjustment Mechanism

In the grouping strategy described above, we divide each swath group into four intervals based on cloud cover. This facilitates the identification of low-cloud-cover subregions within each swath; however, fixed threshold-based partitioning can disrupt the integrity and continuity of image swaths and compromise the visual consistency of the selected images. As shown in Figure 5, the connected subset

I_{E}

in Figure 5a belongs to cloud cover interval 1 in Figure 4c, and it contains an inner ring. The gaps created by such inner rings often need to be filled by images from other swaths, which degrades visual consistency.

Can these gaps be filled? Yes. We introduce a dynamic adjustment mechanism. If the topology of a connected subset

I_{E}

(i.e., the connected component group

G_{S_{i} C_{j} K_{l}}

) contains inner rings, the dynamic adjustment mechanism traverses each inner ring and determines whether images in the cloud cover interval group

{G_{S_{i} C_{x}} ∣ x \leq j + 1, x \leq 4}

can cover the polygonal region

R_{i n n e r}

enclosed by the inner ring. When the result is false, the images in

{G_{S_{i} C_{x}} ∣ x \leq j + 1, x \leq 4}

cannot completely cover

R_{i n n e r}

, and the dynamic adjustment mechanism performs no processing. When the result is true, the dynamic adjustment mechanism adds all images that intersect

R_{i n n e r}

to the connected subset

I_{E}

. This means that the dynamic adjustment mechanism allows images within a swath group to be exchanged across cloud cover interval groups, thereby enhancing the adaptive selection capability of the SwathSel algorithm. Figure 5b presents quicklooks of the connected subset after dynamic adjustment, which achieves a balance between cloud cover and visual consistency. In addition, the original connected subset

I_{E}

and the adjusted connected subset

I_{E}^{'}

are simultaneously input to the subset evaluation module to compete with other connected subsets.

3.2. Subset Evaluation Module

The subset evaluation module scores each connected subset

I_{E}

from four dimensions: swath consistency, coverage, cloud cover, and metadata information. We denote the currently uncovered AOI region as AOI’. We then define the candidate set

I^{'} = {I_{E} ∣ I_{E} \in I, V (I_{E}) \cap V (I_{A O I^{'}}) \neq \emptyset}

, where

V (\cdot)

calculates the coverage vector of the input data, and

I_{A O I^{'}}

is a virtual image that exactly covers AOI’. Let the collection of all currently selected connected subsets be the optimal set O. We have

O_{E} \in O

, where the optimal subset

O_{E}

is an element of O. The data structure of

O_{E}

is identical to that of the connected subset

I_{E}

and the connected component group

G_{S_{i} C_{j} K_{l}}

.

The SwathSel algorithm normalizes the evaluation scores for all four dimensions, and the normalization function is defined as follows:

N o r m (x; X) = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}, x \in X

(1)

where

x_{m i n}

and

x_{m a x}

are the minimum and maximum values in

X

, respectively. During each iteration of the selection process,

x_{m i n}

and

x_{m a x}

are recalculated by identifying the current minimum and maximum values in the updated domain of

X

.

3.2.1. Swath Consistency Evaluation

Swath consistency serves as a crucial means to ensure visual consistency. We evaluate the swath consistency scores of connected subsets

I_{E}

at both local and global scales.

1.: Local Swath Consistency Evaluation

When analyzing the topological structure formed by the images in a connected subset, we find that not all

I_{E}

are simply connected. A topological space is simply connected if the line connecting any two points in a region lies entirely within that region and the region contains no holes (i.e., inner rings). As shown in Figure 5a, due to objective constraints (e.g., cloud cover exceeding the retrieval threshold or other quality issues), inner rings can exist in

I_{E}

.

We use

R_{i n n e r}

to denote the polygonal region enclosed by the inner rings in

I_{E}

. If

R_{i n n e r}

is covered by images from different swaths, the local visual consistency of the optimal set O is weakened. We use

f_{i n n e r}

to quantify this effect, as defined in the following equation:

f_{i n n e r} (I_{E}) = \frac{N_{R}}{N_{E}} + \frac{A (R_{i n n e r})}{A (I_{E})}

(2)

where

N_{R}

and

N_{E}

are the number of inner rings and the number of images of

I_{E}

, respectively.

A (R_{i n n e r})

and

A (I_{E})

represent the areas of

R_{i n n e r}

and

I_{E}

, respectively. The local swath consistency score

S_{l c}

is defined as follows:

S_{l c} (I_{E}; I^{'}) = 1 - N o r m (f_{i n n e r} (I_{E}); I^{'})

(3)

2.: Global Swath Consistency Evaluation

When a connected subset

I_{E}

intersects with an optimal subset

O_{E}

in O, we consider the differences between the subsets and evaluate swath consistency from a global perspective. We define

f_{S a t}

,

f_{T i m e}

,

f_{S E A}

, and

f_{R A}

to quantify the effects of satellite source, acquisition time, solar elevation angle, and roll angle, respectively. We use a weighting coefficient

λ_{i}

to control selection preferences. Furthermore, if

I_{E}

intersects with multiple

O_{E}

in O, the metadata information of the

O_{E}

with the largest intersection area is used for the calculation. Then we have the following equations:

\begin{matrix} f_{S a t} (x_{i}, x_{j}) & = f_{1} (x_{i}, x_{j}) = \{\begin{matrix} 0, & S (x_{i}) = S (x_{j}) \\ 1 + |r a n k (x_{i}) - r a n k (x_{j})|, & S (x_{i}) \neq S (x_{j}) \end{matrix} \end{matrix}

(4)

\begin{matrix} f_{T i m e} (x_{i}, x_{j}) & = f_{2} (x_{i}, x_{j}) = | T (x_{i}) - T (x_{j}) | \end{matrix}

(5)

\begin{matrix} f_{S E A} (x_{i}, x_{j}) & = f_{3} (x_{i}, x_{j}) = | S E A (x_{i}) - S E A (x_{j}) | \end{matrix}

(6)

\begin{matrix} f_{R A} (x_{i}, x_{j}) & = f_{4} (x_{i}, x_{j}) = | R A (x_{i}) - R A (x_{j}) | \end{matrix}

(7)

\begin{matrix} S_{g c} (I_{E}; I^{'}, O) & = 1 - \sum_{i = 1}^{4} λ_{i} * N o r m (f_{i} (I_{E}, O_{E}); I^{'}) \end{matrix}

(8)

\begin{matrix} O_{E} & = arg max_{O_{i} \in O} A (V (O_{i}) \cap V (I_{E})) \end{matrix}

(9)

where

λ_{i}

is the weighting coefficient for each influencing factor and is set to 0.25.

V (\cdot)

calculates the coverage vector of the input data, and

A (\cdot)

calculates the coverage area of the input data.

f_{S a t} (\cdot)

calculates the satellite source difference, where

S (\cdot)

denotes the satellite type, and

r a n k (x)

calculates the satellite spatial resolution rank of an image. As shown in Equation (10),

r a n k (\cdot)

depends on spatial resolution: it equals 1 for spatial resolution better than 0.5 m and equals 2 for spatial resolution between 0.5 m and 0.75 m.

f_{T i m e} (\cdot)

calculates the number of days between acquisition times, where

T (\cdot)

denotes the acquisition time of each image.

f_{S E A} (\cdot)

calculates the difference in solar elevation angle, where

S E A (\cdot)

denotes the solar elevation angle.

f_{R A} (\cdot)

calculates the difference in roll angle, where

R A (\cdot)

denotes the roll angle.

Since all images in a connected subset

I_{E}

share identical metadata information, we have

f_{i} (I_{E}) = f_{i} (x), i \in 1, 2, 3, 4, x \in I_{E}

, where x denotes a single image in

I_{E}

. The equation for

r a n k (x)

is as follows:

r a n k (x) = \{\begin{matrix} 1, & G s d (x) \leq 0.5 \\ 2, & 0.5 < G s d (x) \leq 0.75 \end{matrix}

(10)

where

G s d (x)

represents the resolution of image x.

In summary, the swath consistency score

S_{c o n s}

of connected subset

I_{E}

is as follows:

S_{c o n s} (I_{E}; I^{'}, O) = 0.5 * S_{l c} (I_{E}; I^{'}) + 0.5 * S_{g c} (I_{E}; I^{'}, O)

(11)

3.2.2. Coverage and Cloud Cover Evaluation

To avoid severe fragmentation in the selection results of the SwathSel algorithm, we prefer connected subsets

I_{E}

with broader coverage. Therefore, we evaluate the coverage of

I_{E}

as follows:

\begin{matrix} f_{c o v e r} (x_{i}, x_{j}) = A (V (x_{i}) \cap V (x_{j})) \end{matrix}

(12)

\begin{matrix} S_{c o v e r} (I_{E}, I_{A O I^{'}}; I^{'}) = N o r m (f_{c o v e r} (I_{E}, I_{A O I^{'}}); I^{'}) \end{matrix}

(13)

where

f_{c o v e r} (\cdot)

calculates the area of

x_{j}

covered by image

x_{i}

, AOI’ represents the currently uncovered AOI region,

I_{A O I^{'}}

is a virtual image that exactly covers AOI’, and

S_{c o v e r}

represents the coverage score.

Clouds can obscure ground objects of interest; therefore, the selected images are expected to have as little cloud cover as possible. We define the cloud cover score

S_{c l o u d}

as follows:

\begin{matrix} f_{c l o u d} (I_{E}) & = \frac{\sum_{x \in I_{E}} A (x) * C l o u d_{x}}{A (I_{E})} \end{matrix}

(14)

\begin{matrix} S_{c l o u d} (I_{E}; I^{'}) & = 1 - N o r m (f_{c l o u d} (I_{E}); I^{'}) \end{matrix}

(15)

where

f_{c l o u d} (I_{E})

calculates the average cloud cover in

I_{E}

, and

C l o u d_{x}

denotes the cloud cover of each scene x.

3.2.3. Metadata Information Evaluation

The swath consistency, coverage, and cloud cover evaluations mentioned above depend on the dynamic update process of connected subsets. This section evaluates the metadata information of

I_{E}

itself, including acquisition time, solar elevation angle, and roll angle. For convenience, we construct a virtual connected subset

I_{u s e r}

whose metadata information matches the user preferences. The metadata information of

I_{E}

is expected to be as close as possible to that of

I_{u s e r}

. We use the metadata score

S_{m e t a d a t a}

to quantify this deviation:

S_{m e t a d a t a} (I_{E}; I^{'}) = 1 - \sum_{i = 2}^{4} {\tilde{λ}}_{i} * N o r m (f_{i} (I_{E}, I_{u s e r}); I^{'})

(16)

where

{\tilde{λ}}_{i}

is the weighting coefficient for user preference, which is uniformly set to 1/3.

f_{i} (\cdot)

is consistent with the Equations (5)–(7). Since most optimal selection tasks do not have a significant preference for satellite sources, the metadata information evaluation module does not score satellite sources.

In summary, the total score

S_{t o t a l}

for each connected subset

I_{E}

is defined as follows:

S_{t o t a l} (I_{E}; I^{'}, O) = {\hat{λ}}_{1} * S_{c o n s} (I_{E}; I^{'}, O) + {\hat{λ}}_{2} * S_{c o v e r} (I_{E}, I_{A O I'}; I^{'}) + {\hat{λ}}_{3} * S_{c l o u d} (I_{E}; I^{'}) + {\hat{λ}}_{4} * S_{m e t a d a t a} (I_{E}; I^{'})

(17)

where

{\hat{λ}}_{1}

,

{\hat{λ}}_{2}

,

{\hat{λ}}_{3}

, and

{\hat{λ}}_{4}

are the weighting coefficients for swath consistency, coverage, cloud cover, and metadata information, respectively. They are all set to 0.25 in this study.

3.3. Image Selection Module

3.3.1. Preliminary Selection

The image selection module employs a greedy strategy for preliminary selection. In each selection iteration, the highest-scoring

I_{E}

in the candidate set

I^{'}

is popped and added to the optimal set O. Meanwhile, AOI’,

I^{'}

, and the images contained in

I_{E}

, along with their total scores, are updated. For computational efficiency, we adopt the same processing method as in Section 3.2 to construct a virtual image

I_{A O I^{'}}

that exactly covers AOI’. Thus, the new extent of AOI’ is as follows:

V (I_{A O I^{'}}) = V (I_{A O I}) - V (I_{E_{m a x}})

(18)

where

I_{E_{m a x}}

is the connected subset with the highest score,

V (\cdot)

calculates the coverage vector of the input data, and − is the difference operator. The connected subset

I_{E}

in the candidate set

I^{'}

is updated as

V (I_{A O I^{'}})

changes. Let the updated

I_{E}

be

I_{E_{n e w}}

; then, we have:

I_{E_{n e w}} = ⋃ x, x \in \{x ∣ x \in I_{E}, V (x) \cap V (I_{A O I^{'}}) \neq \emptyset\}

(19)

As AOI’ is updated, images in the connected subset that do not intersect with

I_{A O I^{'}}

are removed. When

I_{E_{n e w}}

is empty, the entire connected subset is deleted from the set

I^{'}

. In addition, all

I_{E}

in

I^{'}

are re-evaluated using Equation (17).

The image selection module repeats the above iterations until

V (I_{A O I^{'}}) = \emptyset

, i.e., the AOI is completely covered. At this point, the optimal set O becomes the result of the preliminary selection.

3.3.2. Image Refinement

Although the images in

I_{E}

are updated in each iteration (Equation (19)), completely redundant images can still appear in O due to different selection orders of connected subsets. As shown in Figure 6, after subsets 2, 3, and 4 are selected, the earliest selected subset 1 becomes completely redundant.

To eliminate these redundant images, we need to judge whether the union of all remaining image extents can cover the extent of the removed image. The time complexity of this judgment is

O (n^{2})

, where n denotes the number of images. To reduce the computational overhead of image refinement, we propose a rapid refinement technique.

We observed that redundant images only appear at the boundaries of

V (I_{E})

or in its neighborhood. A brief proof is provided in Appendix A. To accelerate the refinement process, we only check redundancy for boundary images of each connected subset after the preliminary selection is complete. Then we can narrow the judgment range down to the boundary images of each subset in O. The time complexity of simplifying redundant images becomes

O (m n)

, where m denotes the total number of images on the boundaries of all subsets in O. Typically, we have

m ≪ n

, which significantly reduces the computational cost. The refined O represents the optimal image set obtained by the SwathSel algorithm.

The pseudocode for SwathSel is shown in Algorithm 1.

Algorithm 1 SwathSel

Input: Candidate images I, the extent of the area of interest

V (I_{A O I})

Output: Optimal image set O

1:: Divide candidate images I into connected subsets $I_{E}$ using a composite grouping strategy; all $I_{E}$ form the candidate set $I^{'}$
2:: while $V (I_{A O I}) ⊈ V (O)$ and $I^{'} \neq \emptyset$ do
3:: Calculate the evaluation score for each $I_{E}$ in $I^{'}$
4:: Pop the highest-scoring connected subset $I_{E_{m a x}}$
5:: $O \leftarrow O \cup I_{E_{m a x}}$
6:: Remove $V (I_{E_{m a x}})$ from the extent of the currently uncovered region $V (I_{A O I^{'}})$
7:: for each image $x \in I_{E}$ in $I^{'}$ do
8:: if $V (x) \cap V (I_{A O I^{'}}) = \emptyset$ then
9:: Remove x from $I_{E}$
10:: end if
11:: end for
12:: end while
13:: for each image x on the boundary of $I_{E}$ in O do
14:: if $V (O) = V (O \ {x})$ then
15:: Remove x from O
16:: end if
17:: end for
18:: Output the final result O

4. Results

4.1. Experimental Setup

4.1.1. Implementation Details

The SwathSel model comprises two main categories of hyperparameters: user-specified and algorithm-inherent. User-specified hyperparameters include preferences for metadata information and their corresponding weighting coefficients. For generality, we set the user-preferred image acquisition time to the middle date of the acquisition date range, i.e., 17 May 2025. We also set the user-preferred solar elevation angle to

90 °

and the roll angle to

0 °

. We assume equal preference across these three dimensions, assigning a uniform weighting coefficient of 1/3, as shown in Equation (16). Hyperparameters inherent to the algorithm primarily include the division ratio of cloud cover intervals and the weighting coefficients for the various influencing factors in the subset evaluation module. We discuss the division ratio of cloud cover intervals in detail in Section 4.3. The weighting coefficients can be adjusted according to application-specific requirements; in this study, they are uniformly set to 0.25 (see Equations (8) and (17)). All experiments were conducted on a workstation with an Intel(R) Xeon(R) W-5 2445 CPU (3.1 GHz) and 128 GB RAM.

4.1.2. Quantitative Evaluation Metrics

To quantitatively evaluate the performance of different models, we introduce the evaluation metrics used in this study before presenting the experimental results.

Coverage Ratio (CR):The ratio of the AOI covered by the selected optimal image set O. We merge the extents of all selected images and calculate the ratio of the covered area to that of the virtual image $I_{A O I}$ , which precisely matches the extent of AOI. The coverage ratio (CR) is calculated as follows:

$CR = \frac{A (V (O) \cap V (I_{A O I})}{A (I_{A O I})}$

(20)

where $A (\cdot)$ calculates the coverage area of the input data, $V (\cdot)$ calculates the coverage vector of the input data, and $V (O) \cap V (I_{A O I})$ denotes the vector range of the AOI covered by the selected images.
Redundancy Ratio (RR): The ratio of the sum of the areas of all selected images to the area of the AOI, minus 1. This metric serves as a key method for measuring image redundancy. Following Tao et al. [22] and Li et al. [23], RR is calculated as below:

$RR = \frac{\sum_{x \in O} A (x)}{A (A O I)} - 1$

(21)

where x represents a single scene image in the optimal set O.
Cloud Area Ratio (CAR): The ratio of cloud area to AOI area across all selected images. This metric reflects the upper limit of cloud cover when mosaicking the selected images, i.e., when all clouds are retained in the final mapping product. The CAR is calculated as follows:

$CAR = \frac{\sum_{x \in O} A (x) * C l o u d_{x}}{A (I_{A O I})}$

(22)

where $C l o u d_{x}$ represents the cloud cover percentage of each image x.
Root Mean Square Error of Satellite Source Continuity ( $R M S E_{S S C}$ ): The root mean square error of the satellite source differences between each image and its intersecting images in the dataset. This metric quantifies the overall visual consistency of a dataset from the perspective of satellite source continuity. The $R M S E_{S S C}$ is calculated as follows:

$R M S E_{S S C} = \sqrt{\frac{\sum_{x_{i} \in I_{R}} \sum_{x_{j} \in N (x_{i})} f_{S a t}^{2} (x_{i}, x_{j})}{\sum_{x_{i} \in I_{R}} \sum_{x_{j} \in N (x_{i})} I_{N (x_{i})} (x_{j})}}$

(23)

where $N (x_{i})$ denotes the set of images in O that intersect with $x_{i}$ , $I_{N (x_{i})}$ is the indicator function defined on $N (x_{i})$ , and $f_{S a t} (\cdot)$ calculates the satellite source difference between images $x_{i}$ and $x_{j}$ , as defined in Equation (4).
Root Mean Square Error of Acquisition Time Continuity ( $R M S E_{A T C}$ ): The root mean square error of the acquisition time differences between each image and its intersecting images in the dataset. This metric quantifies the overall visual consistency of a dataset from the perspective of acquisition time continuity. The $R M S E_{A T C}$ is calculated as follows:

$R M S E_{A T C} = \sqrt{\frac{\sum_{x_{i} \in O} \sum_{x_{j} \in N (x_{i})} f_{T i m e}^{2} (x_{i}, x_{j})}{\sum_{x_{i} \in O} \sum_{x_{j} \in N (x_{i})} I_{N (x_{i})} (x_{j})}}$

(24)

where $f_{T i m e} (\cdot)$ computes the absolute value of the time interval between images $x_{i}$ and $x_{j}$ , as defined in Equation (5).
Root Mean Square Error of Solar Elevation Angle Continuity ( $R M S E_{S E A C}$ ): The root mean square error of the solar elevation angle differences between each image and its intersecting images in the dataset. This metric quantifies the overall visual consistency of a dataset from the perspective of solar elevation angle continuity. The $R M S E_{S E A C}$ is calculated as follows:

$R M S E_{S E A C} = \sqrt{\frac{\sum_{x_{i} \in I_{R}} \sum_{x_{j} \in N (x_{i})} f_{S E A}^{2} (x_{i}, x_{j})}{\sum_{x_{i} \in I_{R}} \sum_{x_{j} \in N (x_{i})} I_{N (x_{i})} (x_{j})}}$

(25)

where $f_{S E A} (\cdot)$ calculates the absolute value of the difference in solar elevation angle between images $x_{i}$ and $x_{j}$ , as defined in Equation (6).
Root Mean Square Error of Roll Angle Continuity ( $R M S E_{R A C}$ ): The root mean square error of the roll angle differences between each image and its intersecting images in the dataset. This metric quantifies the overall visual consistency of a dataset from the perspective of roll angle continuity. The $R M S E_{R A C}$ is calculated as follows:

$R M S E_{R A C} = \sqrt{\frac{\sum_{x_{i} \in I_{R}} \sum_{x_{j} \in N (x_{i})} f_{R A}^{2} (x_{i}, x_{j})}{\sum_{x_{i} \in I_{R}} \sum_{x_{j} \in N (x_{i})} I_{N (x_{i})} (x_{j})}}$

(26)

where $f_{R A} (\cdot)$ calculates the absolute value of the difference in roll angle between images $x_{i}$ and $x_{j}$ , as defined in Equation (7).

4.2. Optimized Selection Results and Analysis

We employed the model proposed by Tao et al. [22] and the DD-RSIRA model proposed by Li et al. [23] as baseline models for comparative experiments. The parameters used for both models are consistent with those reported in the literature. The following subsections present and analyze the results from four perspectives: quantitative metrics, density distribution, spatial distribution, and visual consistency.

4.2.1. Quantitative Results and Analysis

In conducting quantitative analysis of the selected imagery, we adopted the evaluation metrics of scenes, coverage ratio (CR), and redundancy ratio (RR) used by Tao et al. [22] and Li et al. [23]. Additionally, we employed the cloud area ratio (CAR; Equation (22)) metric to measure the upper limit of cloud cover in the selected images for subsequent mapping products. Furthermore, to quantitatively measure the visual consistency of the selected images, this study proposes the following metrics: root mean square error of satellite source continuity (

R M S E_{S S C}

; Equation (23)), root mean square error of acquisition time continuity (

R M S E_{A T C}

; Equation (24)), root mean square error of solar elevation angle continuity (

R M S E_{S E A C}

; Equation (25)), and root mean square error of roll angle continuity (

R M S E_{R A C}

; Equation (26)). These metrics evaluate the differences between each image and its adjacent images across multiple dimensions, thereby reflecting the visual consistency of the optimal selection results. Quantitative results of all models on all datasets are summarized in Table 3, with optimal values in bold.

In Table 3, all models achieved 100% coverage of the AOI while significantly reducing the number of scenes in the dataset. Our SwathSel model achieved state-of-the-art performance on most metrics across the four datasets. Specifically, our model achieved improvements of 74.07%, 69.52%, 20.24%, and 20.88% over the best results of Tao et al. and DD-RSIRA on RR for datasets NQ05, SA21, NG48, and NKL4, respectively. We also achieved improvements of 61.57%, 67.99%, 83.61%, and 65.06% on CAR, respectively. In addition, SwathSel model achieved over 30% improvement in

R M S E_{S S C}

and

R M S E_{R A C}

. Only

R M S E_{A T C}

on NKL4 and

R M S E_{S E A C}

on NQ05 and NKL4 showed slight decreases. This outlier is attributable to differences in how image acquisition time is weighted across models and to the distribution of acquisition times within the dataset. The DD-RSIRA model considers only coverage, cloud cover, and acquisition time. Consequently, compared with SwathSel, DD-RSIRA is more likely to select images acquired on 17 May 2025, which matches the user preference. As shown in Figure 2b, the acquisition time distribution of the NKL4 dataset is relatively uniform and peaks around the user-specified date of 17 May 2025. The DD-RSIRA model can utilize images from this interval to cover a larger portion of the AOI, thereby achieving a lower

R M S E_{A T C}

. Furthermore, the subsolar point travels between the Tropic of Cancer and the Tropic of Capricorn each year. The maximum value of the solar elevation angle depends on the difference between the latitude of the region and the latitude of the subsolar point. Thus, the image acquisition time indirectly affects the range of solar elevation angle values. The NQ05 dataset is located in a high-latitude region, making its solar elevation angle values more sensitive to the image acquisition date. Additionally, the acquisition time of the NQ05 dataset shows a spike on 17 May 2025, which contributes to DD-RSIRA achieving better

R M S E_{S E A C}

than SwathSel. For specific optimization requirements, the SwathSel model allows users to specify the weights of the influencing factors (see Equations (16) and (17)), thereby enabling different optimization outcomes.

In terms of time consumption, the runtime of Tao et al.’s model increases rapidly as the dataset size grows. SwathSel shows runtime comparable to DD-RSIRA across all datasets and is faster than DD-RSIRA on NG48. This behavior is attributable to using connected subsets as the basic unit for image selection. When multiple connected subsets have large coverage and high scores, the AOI is covered quickly. In contrast, when most connected subsets have small coverage, more iterations are required to achieve full AOI coverage, which increases runtime.

4.2.2. Density Distribution Results and Analysis

This subsection analyzes the density distribution of the selected results. Following Tao et al. [22] and Li et al. [23], we rasterized the AOI into grid cells with equal spacing. A uniform grid interval of 5 km was adopted to visualize the density distributions, as shown in Figure 7. From top to bottom, the subplots display the raw data and the selected results of Tao et al., DD-RSIRA, and SwathSel, respectively. From left to right, the columns correspond to the NQ05, SA21, NG48, and NKL4 datasets. The value in each grid cell indicates the number of remote sensing images at that location. As the number of overlapping images increases, the color of each grid cell gradually transitions from dark blue to yellow and then to red. As shown in Figure 7, except for the first row, the same value-to-color mapping is used within each column. This is because the raw data exhibits significant image redundancy.

1.: NQ05 dataset

In the NQ05 dataset, the raw data show higher image density in the upper-left, lower-left, and right-central regions. The density distributions produced by Tao et al. and DD-RSIRA are similar to the raw data. In contrast, SwathSel uses connected subsets as the basic unit for selection, and its density distribution differs substantially from the raw data. Moreover, because connected subsets are derived from swath data, their distribution along the swath is clearly organized and exhibits low redundancy. Therefore, SwathSel yields a lower density in both highlighted regions.

2.: SA21 dataset

In the SA21 dataset, the center of the raw data is denser. This situation differs in Tao et al. [22], DD-RSIRA, and SwathSel. As shown in Figure 7f,j, Tao et al. [22] exhibit higher density on the right side, while DD-RSIRA shows higher density on the left side. The SwathSel proposed in this study exhibits clear swath-based distribution characteristics and the lowest density.

3.: NG48 dataset

In the NG48 dataset, the raw data exhibits slightly lower image density in the central region. Tao et al. [22] show higher density on the left side. Both DD-RSIRA and SwathSel exhibit trends aligned with swath boundaries. However, as indicated by the black circles in Figure 7k,o, DD-RSIRA still displays complex high-density distributions in certain regions.

4.: NKL4 dataset

In the NKL4 dataset, most regions in the raw data exhibit high image density. Both Tao et al. and DD-RSIRA yield scattered density distributions, with Tao et al. showing higher density values. The SwathSel model maintains a clear swath trend with low density.

4.2.3. Spatial Distribution Results and Analysis

To intuitively compare differences in spatial coverage and image selection preferences among various models, we analyzed the spatial distribution of the images. As shown in Figure 8, the images selected by Tao et al. [22], DD-RSIRA, and SwathSel are represented by orange, green, and blue rectangles, respectively. It can be observed that the right part of Figure 8a exhibits a clear swath trend, while the coverage pattern of the remaining areas is complex, which is highly consistent with the results presented in the density distribution results in Figure 7e. The spatial distribution of the DD-RSIRA and SwathSel models across the four datasets also aligns with the density distribution shown in Figure 7.

Compared with the baseline models, the images selected by SwathSel exhibit a more regular and ordered spatial distribution. This relates to the composite grouping strategy used in our model. To better utilize the swath information of the images, we divided the swath data into connected groups, changing the smallest unit of image selection from a single image to a group of connected images with the same swath. The selected images show a clear distribution trend along the swath, validating the effectiveness of the swath-based composite grouping strategy.

4.2.4. Visual Consistency Analysis

This subsection analyzes the visual consistency of the selected images. We used NG48 as a representative example to analyze the distributions of satellite sources, acquisition times, solar elevation angles, and roll angles in the selection results of all models. The experimental results are shown in Figure 9. Figure 9a–c show the satellite source distributions of Tao et al., DD-RSIRA, and SwathSel, respectively. Overlapping regions are labeled with the first satellite name appearing after lexicographical sorting. Figure 9d–f show the distribution of the number of days between the specified date (i.e., 27 May 2025) and the image acquisition time selected by Tao et al., DD-RSIRA, and SwathSel, respectively. Figure 9g–i show the distribution of the difference between the specified solar elevation angle (i.e.,

90 °

) and the solar elevation angles in the images selected by Tao et al., DD-RSIRA, and SwathSel, respectively. Figure 9j–l show the distribution of the difference between the roll angle values selected by the three models (i.e., Tao et al., DD-RSIRA, SwathSel) and a specified value (i.e.,

0 °

).

We also selected a region exhibiting substantial visual variation in NG48 to enable a more intuitive comparison of visual consistency across models. This region is indicated by a black rectangle in Figure 9 and Figure 10. The visual consistency performance of the models within the highlighted region is summarized in Table 4.

1.: Satellite Source Continuity

Compared to the baseline models, the results of the SwathSel model proposed in this study demonstrate improved satellite source continuity. As shown in Figure 9a,b, Tao et al. [22] and DD-RSIRA exhibit scattered distributions in the left region of NG48, while Figure 9c demonstrates a more continuous distribution of satellite sources in the SwathSel model. Within the black rectangular region, Tao et al., characterized by a more dispersed satellite source distribution, demonstrate the highest

R M S E_{S S C}

.

2.: Acquisition Time and Solar Elevation Angle Continuity

Because the movement of the subsolar point is related to seasonal variation, the solar elevation angle is correlated with acquisition date. As shown in Figure 9d–i, the distributions of acquisition time and solar elevation angle are similar across models. Although images selected by Tao et al. [22] and DD-RSIRA generally align more closely with the specified date and solar elevation angle, their distributions are more scattered. Distinct patches are observed on the left side, indicating poor data consistency. Within the black rectangular area, light yellow occupies the largest proportion in Figure 9d, indicating that the image acquisition times selected by Tao et al. are mostly close to the user-preferred acquisition time of 17 May 2025. For DD-RSIRA, the light blue area is more extensive, displaying varying shades and a discontinuous distribution. In contrast, the image acquisition times selected by SwathSel are nearly uniform in color, with only a slight darkening in the upper left corner. Although this distribution results in the largest deviation between its acquisition time and the user-specified date, its

R M S E_{A T C}

remains significantly lower than those of the other two models due to the even distribution of acquisition times. The solar elevation angle is closely related to the imaging time, resulting in a distribution similar to that of acquisition time; SwathSel also achieves the lowest

R M S E_{S E A C}

.

3.: Roll Angle Continuity

The roll angle distributions differ markedly across methods. As shown in Figure 9j,k, both Tao et al. [22] and DD-RSIRA exhibit disorganized distributions on the left. In contrast, the images selected by SwathSel exhibit a regular roll angle distribution with a clear trend aligned with the swath orientation, as shown in Figure 9l. Within the black rectangular area, SwathSel exhibits the most concentrated roll angle distribution, resulting in the lowest

R M S E_{R A C}

.

We also present the quicklooks of the images selected by each model, as shown in Figure 10. The results from the quicklooks are consistent with those in Figure 9. Within the black rectangular region, Tao et al. and DD-RSIRA selected the same images in the lower right corner. Although these images are closer to the user-preferred acquisition time, they show higher cloud cover, whereas SwathSel yields a CAR of only 0.10%. Furthermore, the satellite distribution within the black rectangular region is more concentrated in SwathSel; the left side exhibits a green tone, while the right side leans toward reddish-brown. In contrast, the results from Tao et al. and DD-RSIRA show a more chaotic overall image tone, with higher

R M S E_{S S C}

values. All these quantitative metrics and visual representations indicate that the images selected by SwathSel exhibit the best visual consistency.

4.3. Analysis of the Cloud Cover Intervals

To investigate the impact of cloud cover intervals on algorithm performance, we conducted experiments on NG48 with different numbers of cloud cover intervals (CCI) and different division ratios. The experimental results are shown in Table 5. The proposed SwathSel model is achieved when the number of cloud cover intervals is 4, and the division ratio is 1:2:3:4.

As the number of CCI increases, although the cloud cover ratio across intervals remains at 1, the range of CCI1 continues to narrow. The image selection process imposes increasingly restrictive constraints on cloud cover, requiring more iteration rounds to achieve optimal selection. Thus, as shown in Table 5, while the redundancy of the selected images continues to increase and the computational cost continues to rise, the cloud area ratio (CAR) decreases. And the CAR is undoubtedly the metric we are most concerned with. Finer-grained cloud cover interval divisions may increase the influence of inner rings on local swath consistency within connected subsets (Equation (2)) but may also split existing inner rings into multiple connected subsets without inner rings. As a result, the visual consistency metrics for different numbers of cloud cover intervals are relatively similar, without a clear trend. When the number of intervals reaches five, the advantage in CAR from finer-grained division is no longer significant, while computational cost continues to increase. Therefore, the number of intervals is set to four.

When the number of CCI is fixed, if the width of the low-cloud-cover interval is too large, the algorithm’s sensitivity to images with low cloud cover will decrease sharply, making it difficult to identify truly low-cloud-cover connected subsets. As a result, the selected CAR values for the images will be relatively high. Furthermore, although the dynamic adjustment mechanism can flexibly utilize images from the next CCI to fill the inner ring regions of connected subsets, the narrowing width of high-cloud-cover intervals may require the inner ring regions to span the next two CCI to be filled. Unfilled inner ring regions may be covered by multiple different connected subsets in the next CCI, which can degrade both redundancy and visual consistency of the images. Consequently, the 4:3:2:1 variant exhibits poor CAR, redundancy, and visual consistency. Conversely, if the width of the low-cloud-cover interval is too narrow, the strict cloud-cover constraint improves CAR performance. However, an excessively narrow interval also significantly reduces the coverage of connected subsets. Suppose a connected subset

I_{E_{C C I 1}}

selected from CCI1 contains no inner ring, and a connected subset

I_{E_{C C I 2}}

from CCI2 completely covers

I_{E_{C C I 1}}

. Due to the pruning process described in Equation (19), when

I_{E_{C C I 2}}

is subsequently selected, the single-scene image x in

I_{E_{C C I 2}}

must be removed if it is fully covered by

I_{E_{C C I 1}}

. This removal causes

I_{E_{C C I 2}}

to generate an additional inner ring. Therefore, as the low-cloud-cover interval width narrows, algorithmic time consumption increases, and performance in redundancy and visual consistency declines. The performance of the 1:3:5:7, 1:2:3:4, and 1:1:1:1 variants in Table 5 supports this observation.

Although the 1:1:1:1 variant demonstrates superior redundancy, visual consistency, and time complexity, the 1:2:3:4 variant achieves a substantial 32.58% reduction in CAR, while the 1:3:5:7 variant provides only an additional 2.25% reduction. This outcome aligns with the analysis of cloud cover distribution in Section 3.1.2, where the CCI1 cloud cover range corresponds to the peak interval of the cloud cover probability density distribution, ensuring sufficient coverage of images with low cloud cover. At this stage, further narrowing the interval width yields limited improvements in CAR and may require a large number of additional images from CCI2 to cover the remaining areas. Conversely, increasing the interval width reduces the granularity of cloud cover division for connected subsets, resulting in a higher CAR. Therefore, the 1:2:3:4 division ratio is selected as the default for the SwathSel algorithm.

4.4. Ablation Study

We conducted ablation experiments using SwathSel-WC and SwathSel-WD to investigate the contributions of swath consistency constraints and the dynamic adjustment mechanism to the proposed SwathSel algorithm. Specifically, SwathSel-WC operates without swath consistency constraints, SwathSel-WD operates without the dynamic adjustment mechanism, while SwathSel integrates both of these components. The quantitative results of SwathSel-WC, SwathSel-WD, and SwathSel across four datasets are shown in Table 6.

Compared to SwathSel, SwathSel-WC exhibited lower performance across most visual consistency metrics in four datasets, with only slightly better

R M S E_{S E A C}

performance on the NKL4 dataset. Simultaneously, SwathSel-WC’s performance on RR and CAR fluctuated near SwathSel levels without a clear trend. This verifies the crucial role of swath consistency constraints in ensuring the visual consistency of selected images.

SwathSel-WD consistently performed worse on RR metrics but better on CAR metrics. Except for

R M S E_{S E A C}

in the SA21 dataset, visual consistency metrics showed slight declines. This indicates that the dynamic adjustment mechanism sacrificed some CAR performance to achieve better RR and visual consistency. Our SwathSel model, which integrates the advantages of swath consistency constraints and the dynamic adjustment mechanism, delivers the most balanced performance among the three models.

5. Discussion

The optimal selection of remote sensing images is a complex trade-off process. Different selection models emphasize different aspects, leading to substantial variations in their outcomes. Tao et al. [22] designed their model using an overlap-sensitive grid voting strategy to achieve maximum coverage for each image. In contrast, Li et al. [23] aimed to minimize the number of images used for coverage and achieved a comprehensive trade-off between cloud cover and image acquisition time. This study, from the perspective of visual consistency, utilizes the prior knowledge that images with the same swath have optimal redundancy and consistency performance to design the SwathSel model. To evaluate the performance differences across the models, experiments were conducted on four datasets spanning different latitude bands, cloud cover levels, data densities, and geographic scales.

The results of quantitative metrics are shown in Table 3. On the NG48 and NKL4 datasets, DD-RSIRA utilized fewer images than both Tao et al. and SwathSel while achieving 100% coverage. However, the situation is reversed on the NQ05 and SA21 datasets, where the DD-RSIRA model uses the most images. Furthermore, our SwathSel model consistently achieved the lowest redundancy across all datasets. This phenomenon is related to variations in image width. As shown in Table 1, when cataloging swath data by scenes, the widths of images from different satellite sources can vary significantly. As shown in Figure 8, when compared to the results selected by DD-RSIRA on NG48 and NKL4, SwathSel selected more scenes with smaller widths but derived from the same swath. Although SwathSel selected a larger number of images, the overall redundancy was lower. As shown in Figure 8a,d, on NQ05 and SA21, Tao et al. adopted a coverage-sensitive strategy to select images with more extensive single-scene coverage. Although Tao et al. achieved better redundancy performance compared to DD-RSIRA, they still have a significant gap with SwathSel. These experimental results demonstrate that the swath-based selection method can mitigate the impact of image width variations and achieve superior redundancy performance. Furthermore, SwathSel outperforms both baseline models in terms of cloud coverage. Taking NG48 as an example, we analyze the visual consistency performance of different models and present the quicklooks of the images selected by each model. As shown in Figure 9 and Figure 10, the proposed SwathSel model also achieves the best performance. For specific selection requirements, the SwathSel model can also assign weights to each influencing factor based on user preferences (see Equations (16) and (17) for details), yielding different results.

We also conducted ablation experiments to investigate the impact of the proposed dynamic adjustment mechanism and the swath consistency constraints. The swath consistency constraints consist of two components: the local swath consistency constraint and the global swath consistency constraint. The local swath consistency constraint aims to obtain connected subsets with fewer inner rings, as shown in Equation (3), and this objective coincides with that of the dynamic adjustment mechanism. Both approaches maintain the integrity of swath data by reducing the number of inner rings within connected subsets, thereby improving the redundancy and visual consistency. The dynamic adjustment mechanism achieves this by relaxing cloud cover interval constraints within the same swath, while the local swath consistency constraint does so by selecting different swaths. The global swath consistency constraint aims to identify the connected subset that most closely matches the visual consistency of the selected image. If the chosen connected subset contains an excessive number of inner rings, redundancy will increase accordingly, even though the overall visual consistency metric improves. When swath consistency constraints are applied, dominance of the local swath consistency constraint leads to improvements in both redundancy and visual consistency. In contrast, when the global swath consistency constraint dominates, visual consistency improves, but redundancy performance may decline. This relationship explains the observed fluctuations of SwathSel-WC in redundancy (Table 6). With respect to CAR, the swath consistency constraints distribute the effective weighting away from cloud cover, which generally decreases CAR performance. Conversely, the local swath consistency constraint encourages the selection of connected subsets without inner rings. Compared to selecting multiple connected subsets with the same cloud cover ratio but greater overlap, this approach results in a smaller total cloud-covered area and a lower CAR value (Equation (22)). Therefore, in certain situations, the swath consistency constraints may actually improve CAR performance. In addition, the computation of swath consistency constraints introduces considerable time overhead. In contrast, the dynamic adjustment mechanism fills the inner rings of connected subsets, preventing repeated coverage of these areas and reducing the overall time overhead of the model.

In summary, the SwathSel algorithm maintains visual consistency among selected images while preserving low redundancy and cloud cover, making it highly competitive. When selecting remote sensing images, another noteworthy issue is achieving cloud-free coverage of the AOI, which has also been a hot research topic in recent years [26]. In areas with heavy cloud cover, multiple layers of images often need to be stacked to obtain cloud-free coverage. However, this significantly increases the redundancy of the selected images, and mosaicing numerous images within a small area often results in poor visual effects. The SwathSel model proposed in this study only considers the cloud cover ratio of each image and does not capture the specific positions of clouds within the images; thus, it cannot be applied to the task of optimal image selection for cloud-free coverage. However, how to utilize cloud-based location information and how to balance trade-offs between redundancy, stacking relationships, and visual consistency presents a fascinating and valuable challenge. This will also be the direction of our future work, and we will further explore the application of visual consistency for cloud-free coverage.

6. Conclusions

This study proposes SwathSel, a novel swath-based optimal remote sensing image selection model for large-scale mapping, which maintains low redundancy and cloud cover while ensuring visual consistency among the selected images. We first employ a composite grouping strategy to group candidate images based on swaths, cloud cover, and topological connectivity. As a result, the basic selection unit is expanded from a single image to a connected image group. To address the limitations induced by fixed cloud cover interval groups and to improve visual consistency, we propose a dynamic adjustment mechanism that allows candidate images to flow across different cloud cover interval groups. Additionally, we introduce local and global swath consistency constraints to enhance visual consistency among selected images. We conducted experiments on four datasets covering different latitude bands, cloud cover levels, data densities, and geographic scales. We also propose four quantitative metrics to evaluate the visual consistency of selected images. Compared with the baseline models, SwathSel achieves better visual consistency while reducing RR by 74.07%, 69.52%, 20.24%, and 20.88%, respectively, and reducing CAR by 61.57%, 67.99%, 83.61%, and 65.06%, respectively. Additionally, we conducted ablation experiments to validate the effectiveness of the dynamic adjustment mechanism and the swath consistency constraints. Overall, SwathSel achieves a balanced trade-off among redundancy, cloud cover, and visual consistency.

Author Contributions

All authors made significant contributions to the manuscript. Conceptualization, B.Z. and S.Y.; methodology, B.Z., Y.A. and Z.X.; software, B.Z. and Y.A.; validation, B.Z., S.Y., Y.L. and L.F.; formal analysis, B.Z., Y.A., Y.L. and W.A.; investigation, L.F.; resources, S.Y.; data curation, B.Z. and Z.X.; writing—original draft preparation, B.Z.; writing—review and editing, B.Z., Y.A., Y.L. and W.A.; visualization, B.Z. and Y.L.; supervision, S.Y.; project administration, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Development Program of Jilin Province, grant number 20260201053GX.

Data Availability Statement

Restrictions apply to the availability of these data.

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments in making this paper a better presentation.

Conflicts of Interest

Authors Bai Zhang, Zongyu Xu, Yunhe Liu, Wenhao Ai, Liming Fan, Yuan An and Shuhai Yu were employed by the company Chang Guang Satellite Technology Co., Ltd. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AOI	Area of Interest
CR	Coverage Ratio
RR	Redundancy Ratio
CCI	Cloud Cover Interval
CAR	Cloud Area Ratio
$R M S E_{A T C}$	Root Mean Square Error of Acquisition Time Continuity
$R M S E_{S S C}$	Root Mean Square Error of Satellite Source Continuity
$R M S E_{S E A C}$	Root Mean Square Error of Solar Elevation Angle Continuity
$R M S E_{R A C}$	Root Mean Square Error of Roll Angle Continuity
DD-RSIRA	Remote Sensing Image Retrieval Algorithm for Dense Data
SwathSel-WC	SwathSel Model without Swath Consistency Constraints
SwathSel-WD	SwathSel Model without Dynamic Adjustment Mechanism

Appendix A

Theorem A1.

When a connected subset

S_{n}

is incorporated, any newly introduced redundant scenes can only occur among (i) the boundary scenes of the original selected set

O_{n - 1}

and (ii) their 1-hop adjacent scenes within the union set

{O_{n - 1}, S_{n}}

.

Before proving the theorem, we state the following assumptions:

1.: Prior to incorporating $S_{n}$ , the original set $O_{n - 1}$ contains no redundant scenes.
2.: The connected subset $S_{n}$ contains no redundant scenes, since all scenes in $S_{n}$ belong to the same swath.
3.: According to the satellite characteristics in Table 1, the maximum scene width does not exceed twice the minimum scene width, i.e., $w_{max} < 2 w_{min}$ .

Figure A1. Schematic illustration of a redundant scene x.

Proof.

Proof by contradiction: Assume that after incorporating

S_{n}

, there exists a scene x such that

x \notin Edge (O_{n - 1}) \cup 1 - hop (Edge (O_{n - 1}))

and yet x is redundant in the combined set

{O_{n - 1}, S_{n}}

. We consider two cases.

Case 1:

x \in O_{n - 1}

. Because

x \notin Edge (O_{n - 1}) \cup 1 - hop (Edge (O_{n - 1}))

, x cannot become redundant due to interactions only with boundary scenes of

O_{n - 1}

(or their 1-hop neighbors). Meanwhile, by Assumption (1),

O_{n - 1}

itself contains no redundant scenes; therefore, x is non-redundant within

O_{n - 1}

. Hence, if x becomes redundant in

{O_{n - 1}, S_{n}}

, the redundancy must be introduced by adding

S_{n}

, i.e., the set

{x, S_{n}}

must already render x redundant (Figure A1). This implies that scenes from

S_{n}

fully cover the effective contribution of x. However, since x is not on (or adjacent to) the boundary of

O_{n - 1}

, the spatial region contributed by x lies in the interior of

V (O_{n - 1})

. Under the bounded-width condition in Assumption (3), scenes in a newly added connected subset cannot fully subsume such an interior scene without also subsuming (or creating redundancy among) boundary scenes first. This contradicts the assumption that redundancy arises at x while avoiding the boundary neighborhood.

Case 2:

x \in S_{n}

. Since

S_{n}

is incorporated because it contributes to covering the currently uncovered region of

O_{n - 1}

, scene x must intersect the uncovered part with respect to

O_{n - 1}

; therefore, x cannot be redundant with respect to

O_{n - 1}

alone. In addition, by Assumption (2),

S_{n}

contains no redundant scenes internally. Consequently, x can be redundant in

{O_{n - 1}, S_{n}}

only if it is completely covered by scenes in

O_{n - 1}

that lie on the boundary of

O_{n - 1}

or in their immediate neighborhood, i.e.,

x \in Edge (O_{n - 1}) \cup 1 - hop (Edge (O_{n - 1}))

. This contradicts the hypothesis that x is outside this set.

Since both cases lead to contradictions, the assumption is false. Therefore, when the subset

S_{n}

is incorporated, redundant scenes can only occur among the boundary scenes of

O_{n - 1}

or their 1-hop adjacent scenes within

{O_{n - 1}, S_{n}}

. □

References

Kulu, E. Satellite constellations—2024 survey, trends and economic sustainability. In Proceedings of the International Astronautical Congress, IAC, Milan, Italy, 14–18 October 2024; pp. 14–18. [Google Scholar]
Paravano, A.; Patrizi, M.; Razzano, E.; Locatelli, G.; Feliciani, F.; Trucco, P. The impact of the new space economy on sustainability: An overview. Acta Astronaut. 2024, 222, 162–173. [Google Scholar] [CrossRef]
Urabe, T. Overview of satellite-based Earth observation missions in Japan. In Proceedings of the Sensors, Systems, and Next-Generation Satellites XXIX, SPIE, Madrid, Spain, 15–18 September 2025; Volume 13667, pp. 200–207. [Google Scholar]
Secker, J.; Biron, K.; Dessureault, D.; Lamontagne, P.; Rear, R. Automated Collection Planning for Civilian and Commercial Satellite Imagery, and Definition and Exploitation of the Collection Asset Specification Data Structure. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 9764–9797. [Google Scholar] [CrossRef]
Chu, Y.; Ye, M.; Qian, Y. Fine-grained image recognition methods and their applications in remote sensing images: A review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 19640–19667. [Google Scholar] [CrossRef]
Dong, X.; Cao, J.; Zhao, W. A review of research on remote sensing images shadow detection and application to building extraction. Eur. J. Remote Sens. 2024, 57, 2293163. [Google Scholar] [CrossRef]
Li, Q.; Mou, L.; Sun, Y.; Hua, Y.; Shi, Y.; Zhu, X.X. A review of building extraction from remote sensing imagery: Geometrical structures and semantic attributes. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4702315. [Google Scholar] [CrossRef]
Marques, P.; Pádua, L.; Sousa, J.J.; Fernandes-Silva, A. Advancements in remote sensing imagery applications for precision management in olive growing: A systematic review. Remote Sens. 2024, 16, 1324. [Google Scholar] [CrossRef]
Sun, Z.; Zhong, Y.; Wang, X.; Zhang, L. Identifying cropland non-agriculturalization with high representational consistency from bi-temporal high-resolution remote sensing images: From benchmark datasets to real-world application. ISPRS J. Photogramm. Remote Sens. 2024, 212, 454–474. [Google Scholar] [CrossRef]
Kumari, S.; Agarwal, S.; Agrawal, N.K.; Agarwal, A.; Garg, M.C. A Comprehensive Review of Remote Sensing Technologies for Improved Geological Disaster Management. Geol. J. 2025, 60, 223–235. [Google Scholar] [CrossRef]
Shen, S.; Zhang, T.; Zhao, Y.; Wang, Z.; Qian, F. Automatic benggang recognition based on latent semantic fusion of UHR DOM and DSM features. ISPRS Ann. Photogramm. Remote Sens. Spat. Inform. Sci. 2020, 3, 331–338. [Google Scholar] [CrossRef]
Chen, S.; Zhang, Y.; Nie, K.; Li, X.; Wang, W. Extracting building areas from photogrammetric DSM and DOM by automatically selecting training samples from historical DLG data. ISPRS Int. J. Geo-Inf. 2020, 9, 18. [Google Scholar] [CrossRef]
Som-Ard, J.; Atzberger, C.; Izquierdo-Verdiguier, E.; Vuolo, F.; Immitzer, M. Remote sensing applications in sugarcane cultivation: A review. Remote Sens. 2021, 13, 4040. [Google Scholar] [CrossRef]
Warner, T.A.; Nellis, M.D.; Foody, G.M. Remote sensing scale and data selection issues. In The Sage Handbook of Remote Sensing; Foody, G., Warner, T., Nellis, M.D., Eds.; Sage: New York, NY, USA, 2009; pp. 1–17. [Google Scholar]
Lefsky, M.A.; Cohen, W.B. Selection of remotely sensed data. In Remote Sensing of Forest Environments: Concepts and Case Studies; Springer: Berlin/Heidelberg, Germany, 2003; pp. 13–46. [Google Scholar]
Sudha, S.; Aji, S. A review on recent advances in remote sensing image retrieval techniques. J. Indian Soc. Remote Sens. 2019, 47, 2129–2139. [Google Scholar] [CrossRef]
Caprara, A.; Toth, P.; Fischetti, M. Algorithms for the set covering problem. Ann. Oper. Res. 2000, 98, 353–371. [Google Scholar] [CrossRef]
Álvarez-Miranda, E.; Goycoolea, M.; Ljubić, I.; Sinnl, M. The generalized reserve set covering problem with connectivity and buffer requirements. Eur. J. Oper. Res. 2021, 289, 1013–1029. [Google Scholar] [CrossRef]
Ren, Z.G.; Feng, Z.R.; Ke, L.J.; Zhang, Z.J. New ideas for applying ant colony optimization to the set covering problem. Comput. Ind. Eng. 2010, 58, 774–784. [Google Scholar] [CrossRef]
Chu, B.; Gao, F.; Chai, Y.; Liu, Y.; Yao, C.; Chen, J.; Wang, S.; Li, F.; Zhang, C. Large-area full-coverage remote sensing image collection filtering algorithm for individual demands. Sustainability 2021, 13, 13475. [Google Scholar] [CrossRef]
Yan, X.; Liu, S.; Liu, W.; Dai, Q. An improved coverage-oriented retrieval algorithm for large-area remote sensing data. Int. J. Digit. Earth 2022, 15, 606–625. [Google Scholar] [CrossRef]
Tao, P.; Xi, K.; Niu, Z.; Chen, Q.; Liao, Y.; Liu, Y.; Liu, K.; Zhang, Z. Optimal selection from extremely redundant satellite images for efficient large-scale mapping. ISPRS J. Photogramm. Remote Sens. 2022, 194, 21–38. [Google Scholar] [CrossRef]
Li, X.; Liu, S.; Liu, W. Remote Sensing Image Retrieval Algorithm for Dense Data. Remote Sens. 2023, 16, 98. [Google Scholar] [CrossRef]
Liu, S.; Hodgson, M.E. Satellite image collection modeling for large area hazard emergency response. ISPRS J. Photogramm. Remote Sens. 2016, 118, 13–21. [Google Scholar] [CrossRef]
Kempeneers, P.; Soille, P. Optimizing Sentinel-2 image selection in a Big Data context. Big Earth Data 2017, 1, 145–158. [Google Scholar] [CrossRef]
Pan, J.; Chen, L.; Shu, Q.; Zhao, Q.; Yang, J.; Jin, S. Spatiotemporal imagery selection for full coverage image generation over a large area with HFA-Net based quality grading. Geo-Spat. Inf. Sci. 2024, 27, 1524–1541. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Boggs, S. The International Map of the World. Mil. Eng. 1929, 21, 112–114. [Google Scholar]
He, X.; Yang, X.; Liu, P.; Du, J.; Fu, Z.; Cheng, M.; Xu, T. Wide-Swath and High-Resolution Continuous Multi-Strip Scanning Imaging Technology Based on Satellite-Payload Collaboration. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5618014. [Google Scholar]
Katayama, H.; Kato, E.; Imai, H.; Sagisaka, M. Wide swath and high resolution optical imaging satellite of Japan. In Proceedings of the Earth Observing Missions and Sensors: Development, Implementation, and Characterization IV; SPIE: New Delhi, India, 2016; Volume 9881, pp. 111–116. [Google Scholar]
Bannari, A.; Morin, D.; Bénié, G.; Bonn, F. A theoretical review of different mathematical models of geometric corrections applied to remote sensing images. Remote Sens. Rev. 1995, 13, 27–47. [Google Scholar] [CrossRef]
Kobayashi, S.; Sanga-Ngoie, K. The integrated radiometric correction of optical remote sensing imageries. Int. J. Remote Sens. 2008, 29, 5957–5985. [Google Scholar] [CrossRef]
Sun, G.; Hu, Q.; Li, W.; Luo, D.; Lu, Y. Review of Multi-Source High-Resolution Remote Sensing Satellites. J. Phys. Conf. Ser. 2025, 3109, 012039. [Google Scholar] [CrossRef]
Hagihara, Y.; Okamoto, H.; Yoshida, R. Development of a combined CloudSat-CALIPSO cloud mask to show global cloud distribution. J. Geophys. Res. Atmos. 2010, 115. [Google Scholar] [CrossRef]
Kopeika, N.S.; Arbel, D. Imaging through the atmosphere: An overview. Opt. Pulse Beam Propag. 1999, 3609, 78–89. [Google Scholar]
Gui, S.; Song, S.; Qin, R.; Tang, Y. Remote sensing object detection in the deep learning era—A review. Remote Sens. 2024, 16, 327. [Google Scholar] [CrossRef]
Huang, L.; Jiang, B.; Lv, S.; Liu, Y.; Fu, Y. Deep-learning-based semantic segmentation of remote sensing images: A survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 8370–8396. [Google Scholar] [CrossRef]
Bai, T.; Wang, L.; Yin, D.; Sun, K.; Chen, Y.; Li, W.; Li, D. Deep learning for change detection in remote sensing: A review. Geo-Spat. Inf. Sci. 2023, 26, 262–288. [Google Scholar] [CrossRef]
Schenk, T. Towards automatic aerial triangulation. ISPRS J. Photogramm. Remote Sens. 1997, 52, 110–121. [Google Scholar] [CrossRef]
Yu, L.; Zhang, Y.; Sun, M.; Zhou, X.; Liu, C. An auto-adapting global-to-local color balancing method for optical imagery mosaic. ISPRS J. Photogramm. Remote Sens. 2017, 132, 1–19. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of each dataset: (a) NQ05; (b) SA21; (c) NG48; (d) NKL4. The closed polygon formed by red lines represents the region of interest, while the blue polygon indicates the candidate image.

Figure 2. Schematic diagram of cloud cover distribution and acquisition time distribution for different datasets: (a) cloud cover distribution; (b) acquisition time distribution. The blue, orange, green, and red shaded areas represent the probability densities of the NQ05, SA21, NG48, and NKL4 datasets, respectively.

Figure 3. Schematic representation of our SwathSel framework.

G_{S_{i}}

,

G_{S_{i} C_{j}}

, and

G_{S_{i} C_{j} K_{l}}

represent the swath group, the cloud interval group, and the connected component group, respectively. Blue and red arrows indicate data flow within and between modules, respectively.

Figure 3. Schematic representation of our SwathSel framework.

G_{S_{i}}

,

G_{S_{i} C_{j}}

, and

G_{S_{i} C_{j} K_{l}}

represent the swath group, the cloud interval group, and the connected component group, respectively. Blue and red arrows indicate data flow within and between modules, respectively.

Figure 4. Schematic diagram of cloud cover interval grouping: (a) quicklooks of a swath group within the NG48 dataset; (b) different cloud cover interval groups within a swath group; (c) quicklooks of cloud cover interval 1 (CCI1); (d) quicklooks of cloud cover interval 2 (CCI2). Images belonging to cloud cover intervals 1 to 4 in (b) are represented by blue, green, red, and orange, respectively.

Figure 5. Schematic diagram of the dynamic adjustment mechanism: (a) quicklooks of the original connected subset; (b) quicklooks of the connected subset processed by the dynamic adjustment mechanism. The connected subset in this schematic belongs to CCI1 in Figure 4c, and the images from CCI2 are selected through the dynamic adjustment mechanism to fill the gap regions.

Figure 6. Schematic diagram of redundant images. Subsets 1 to 4 are added to the optimal set in sequence. After subset 4 is selected, subset 1 becomes completely redundant.

Figure 7. Density distributions of experimental results. (a–d): density distributions of raw data for the NQ05, SA21, NG48, and NKL4 datasets, respectively; (e–h): density distributions of Tao et al. [22] for the NQ05, SA21, NG48, and NKL4 datasets, respectively; (i–l): density distributions of DD-RSIRA for the NQ05, SA21, NG48, and NKL4 datasets, respectively; (m–p): density distributions of SwathSel for the NQ05, SA21, NG48, and NKL4 datasets, respectively. The color scale from blue to yellow and then to red indicates the number of images in each grid cell. The red polygon represents the boundary of the AOI, and the black circles highlight differences in the images selected by different models.

Figure 8. Spatial distribution of experimental results. (a–c): spatial distributions of Tao et al. [22], DD-RSIRA, and SwathSel in the NQ05 dataset, respectively; (d–f): spatial distributions of Tao et al. [22], DD-RSIRA, and SwathSel in the SA21 dataset, respectively; (g–i): spatial distributions of Tao et al. [22], DD-RSIRA, and SwathSel in the NG48 dataset, respectively; (j–l): spatial distributions of Tao et al. [22], DD-RSIRA, and SwathSel in the NKL4 dataset, respectively.

Figure 9. Satellite source, acquisition time, solar elevation angle, and roll angle distributions of images selected by different models in the NG48 Dataset. (a–c): distributions of satellite sources for Tao et al. [22], DD-RSIRA, and SwathSel, respectively; (d–f): distributions of the number of days between the images selected by Tao et al. [22], DD-RSIRA, and SwathSel and a specified reference date, respectively; (g–i): distributions of the differences in solar elevation angle between a specified value and those selected by Tao et al. [22], DD-RSIRA, and SwathSel, respectively; (j–l): distributions of the differences in roll angle between a specified value and those selected by Tao et al. [22], DD-RSIRA, and SwathSel, respectively. The black rectangles highlight differences in the images selected by different models.

Figure 10. Quicklooks of images selected by different models in the NG48 Dataset: (a) Tao et al.; (b) DD-RSIRA; (c) SwathSel. The black rectangles highlight differences in the images selected by different models.

Table 1. Information on the satellites used.

Satellite Name	Launch Date	Spatial Resolution	Swath Width	Standard Scene Width (Nadir)
JL1KF01A	January 2020	0.75 m	136 km	23 km
JL1KF01B	July 2021	0.5 m	150 km	13 km
JL1KF01C	May 2022	0.5 m	150 km	13 km
JL1KF02B01-06	September 2024	0.5 m	150 km	15 km
JL1GF03D Series	July 2021–June 2023	0.75 m	17 km	17 km

Table 2. Image retrieval settings for each dataset.

Dataset	Acquisition Time	Roll Angle	Resolution	Cloud	Solar Elevation Angle	Satellite Source
NQ05	1 January 2025 to 30 September 2025	$- 15 °$ ∼ $15 °$	0.75 m	$0 \sim 100 %$	$0 °$ ∼ $90 °$	ALL
SA21	1 January 2025 to 30 September 2025	$- 15 °$ ∼ $15 °$	0.75 m	$0 \sim 50 %$	$0 °$ ∼ $90 °$	ALL
NG48	1 January 2025 to 30 September 2025	$- 15 °$ ∼ $15 °$	0.5 m	$0 \sim 50 %$	$0 °$ ∼ $90 °$	UHRS
NKL4	1 January 2025 to 30 September 2025	$- 15 °$ ∼ $15 °$	0.5 m	$0 \sim 50 %$	$0 °$ ∼ $90 °$	UHRS

Table 3. Comparison of the quantitative evaluation results obtained by different methods. The best values for the different metrics are highlighted in bold.

Dataset	Method	Scenes	CR (%)	RR (%)	CAR (%)	${RMSE}_{SSC}$	${RMSE}_{ATC}$	${RMSE}_{SEAC}$	${RMSE}_{RAC}$	Time Consumption (s)
NQ05	Raw	4611	100	853.72	413.47	0.908	64.298	13.261	10.585	-
	Tao et al. [22]	1084	100	156.90	39.14	1.592	31.410	5.602	8.972	132.47
	DD-RSIRA [23]	1271	100	171.55	45.53	0.778	32.494	5.530	7.954	173.7
	SwathSel	530	100	40.69	15.04	0.516	30.777	5.904	4.622	223.75
SA21	Raw	12,360	100	929.41	247.99	0.852	50.705	9.299	3.107	-
	Tao et al. [22]	2477	100	125.64	40.69	0.975	41.086	5.733	2.573	487.39
	DD-RSIRA [23]	2542	100	152.14	31.62	0.727	38.830	4.339	2.552	373.96
	SwathSel	1394	100	38.29	10.12	0.495	25.694	3.996	1.737	634.40
NG48	Raw	33,217	100	2452.79	442.40	0.904	102.241	18.538	3.458	-
	Tao et al. [22]	2253	100	89.37	13.24	0.655	21.948	5.047	2.258	1337.27
	DD-RSIRA [23]	1534	100	33.90	3.66	0.491	21.464	6.278	1.847	432.86
	SwathSel	1717	100	27.04	0.60	0.306	11.597	4.195	0.964	259.32
NKL4	Raw	133,908	100	3719.72	371.77	0.908	107.046	20.184	3.901	-
	Tao et al. [22]	6651	100	92.10	2.02	0.627	18.361	3.710	2.525	18,504.82
	DD-RSIRA [23]	4487	100	34.87	1.66	0.541	12.077	2.598	2.375	2584.83
	SwathSel	4613	100	27.59	0.58	0.341	14.355	3.791	1.291	3049.65

Table 4. Local visual consistency quantitative evaluation results of different models on NG48 dataset. The best values for the different metrics are highlighted in bold.

Method	Scenes	CAR (%)	${RMSE}_{SSC}$	${RMSE}_{ATC}$	${RMSE}_{SEAC}$	${RMSE}_{RAC}$
Tao et al. [22]	200	37.93	0.767	24.683	5.647	1.913
DD-RSIRA [23]	131	9.91	0.647	30.644	9.235	1.741
SwathSel	147	0.10	0.230	3.066	1.589	0.746

Table 5. Comparative experimental results for different numbers of cloud cover intervals and different division ratios. The best values for the different metrics are highlighted in bold.

CCI Number	Division Ratio	Scenes	CR (%)	RR (%)	CAR (%)	${RMSE}_{SSC}$	${RMSE}_{ATC}$	${RMSE}_{SEAC}$	${RMSE}_{RAC}$	Time Consumption (s)
1	1	1559	100	23.99	2.35	0.294	12.256	4.584	1.017	167.28
2	1:1	1559	100	24.32	2.09	0.297	13.215	4.843	1.023	187.36
3	1:1:1	1603	100	26.11	1.16	0.278	14.209	4.383	0.862	208.43
4	1:1:1:1	1608	100	26.41	0.89	0.291	12.249	4.122	0.912	209.21
5	1:1:1:1:1	1611	100	26.72	0.82	0.299	12.321	3.844	0.969	246.07
4	1:1:1:1	1608	100	26.41	0.89	0.291	12.249	4.122	0.912	209.21
	4:3:2:1	1479	100	26.82	2.44	0.326	16.725	5.774	0.974	237.02
	1:3:5:7	1721	100	27.43	0.58	0.309	12.021	4.400	0.965	259.72
	1:2:3:4	1717	100	27.04	0.60	0.306	11.597	4.195	0.964	259.32

Table 6. Quantitative experimental results for the SwathSel model and its variants with different components removed. The best values for the different metrics are highlighted in bold.

Dataset	Method	Scenes	CR (%)	RR (%)	CAR (%)	${RMSE}_{SSC}$	${RMSE}_{ATC}$	${RMSE}_{SEAC}$	${RMSE}_{RAC}$	Time Consumption (s)
NQ05	SwathSel-WC	500	100	39.90	14.00	0.568	33.617	7.251	5.182	169.20
	SwathSel-WD	532	100	42.36	14.14	0.541	32.073	6.251	5.282	226.78
	SwathSel	530	100	40.69	15.04	0.516	30.777	5.904	4.622	223.75
SA21	SwathSel-WC	1423	100	37.26	9.90	0.495	27.721	4.282	1.765	517.11
	SwathSel-WD	1393	100	38.43	10.10	0.502	26.045	3.954	1.765	687.89
	SwathSel	1394	100	38.29	10.12	0.495	25.694	3.996	1.737	634.40
NG48	SwathSel-WC	1643	100	30.88	0.67	0.362	14.769	5.330	0.997	220.30
	SwathSel-WD	1719	100	27.70	0.53	0.330	11.740	4.202	1.016	275.62
	SwathSel	1717	100	27.04	0.60	0.306	11.597	4.195	0.964	259.32
NKL4	SwathSel-WC	4545	100	28.56	0.52	0.356	18.820	3.486	1.370	2647.29
	SwathSel-WD	4716	100	29.00	0.49	0.356	16.247	4.054	1.299	3210.03
	SwathSel	4613	100	27.59	0.58	0.341	14.355	3.791	1.291	3049.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, B.; Xu, Z.; Liu, Y.; Ai, W.; Fan, L.; An, Y.; Yu, S. SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping. Remote Sens. 2026, 18, 1212. https://doi.org/10.3390/rs18081212

AMA Style

Zhang B, Xu Z, Liu Y, Ai W, Fan L, An Y, Yu S. SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping. Remote Sensing. 2026; 18(8):1212. https://doi.org/10.3390/rs18081212

Chicago/Turabian Style

Zhang, Bai, Zongyu Xu, Yunhe Liu, Wenhao Ai, Liming Fan, Yuan An, and Shuhai Yu. 2026. "SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping" Remote Sensing 18, no. 8: 1212. https://doi.org/10.3390/rs18081212

APA Style

Zhang, B., Xu, Z., Liu, Y., Ai, W., Fan, L., An, Y., & Yu, S. (2026). SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping. Remote Sensing, 18(8), 1212. https://doi.org/10.3390/rs18081212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SwathSel: A Swath-Based Optimal Remote Sensing Image Selection Method with Visual Consistency for Large-Scale Mapping

Highlights

Abstract

1. Introduction

2. Materials

2.1. Experimental Satellite Information

2.2. Data Sources

3. Methods

3.1. Composite Grouping Module

3.1.1. Swath Data of Optical Remote Sensing Satellites

3.1.2. Composite Grouping Strategy

3.1.3. Dynamic Adjustment Mechanism

3.2. Subset Evaluation Module

3.2.1. Swath Consistency Evaluation

3.2.2. Coverage and Cloud Cover Evaluation

3.2.3. Metadata Information Evaluation

3.3. Image Selection Module

3.3.1. Preliminary Selection

3.3.2. Image Refinement

4. Results

4.1. Experimental Setup

4.1.1. Implementation Details

4.1.2. Quantitative Evaluation Metrics

4.2. Optimized Selection Results and Analysis

4.2.1. Quantitative Results and Analysis

4.2.2. Density Distribution Results and Analysis

4.2.3. Spatial Distribution Results and Analysis

4.2.4. Visual Consistency Analysis

4.3. Analysis of the Cloud Cover Intervals

4.4. Ablation Study

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI