Next Article in Journal
Federated Learning Approach for Remote Sensing Scene Classification
Next Article in Special Issue
Optimization of Remote-Sensing Image-Segmentation Decoder Based on Multi-Dilation and Large-Kernel Convolution
Previous Article in Journal
Fast, Nondestructive and Precise Biomass Measurements Are Possible Using Lidar-Based Convex Hull and Voxelization Algorithms
Previous Article in Special Issue
CSAN-UNet: Channel Spatial Attention Nested UNet for Infrared Small Target Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

Remote Sensing Image Harmonization Method for Fine-Grained Ship Classification

by
Jingpu Zhang
1,
Ziyan Zhong
2,
Xingzhuo Wei
3,
Xianyun Wu
1,2,* and
Yunsong Li
1
1
State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an 710071, China
2
Guangzhou Institute of Technology, Xidian University, Guangzhou 510555, China
3
School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(12), 2192; https://doi.org/10.3390/rs16122192
Submission received: 21 May 2024 / Revised: 6 June 2024 / Accepted: 13 June 2024 / Published: 17 June 2024

Abstract

:
Target recognition and fine-grained ship classification in remote sensing face challenges of high inter-class similarity and sample scarcity. A transfer fusion-based ship image harmonization algorithm is proposed to overcome these challenges. This algorithm designs a feature transfer fusion strategy based on the combination of a region-aware instantiation and attention mechanism. Adversarial learning is implemented through an image harmony generator and discriminator module to generate realistic remote sensing ship harmony images. Furthermore, the domain encoder and domain discriminator modules are responsible for extracting feature representations of the foreground and background, and further align the ship foreground with remote sensing ocean background features through feature discrimination. Compared with other advanced image conversion techniques, our algorithm delivers more realistic visuals, improving classification accuracy for six ship types by 3% and twelve types by 2.94%, outperforming Sim2RealNet. Finally, a mixed dataset containing data augmentation and harmonizing samples and real data was proposed for the fine-grained classification task of remote sensing ships. Evaluation experiments were conducted on eight typical fine-grained classification algorithms, and the accuracy of the fine-grained classification for all categories of ships was analyzed. The experimental results show that the mixed dataset proposed in this paper effectively alleviates the long-tail problem in real datasets, and the proposed remote sensing ship data augmentation framework performs better than state-of-the-art data augmentation methods in fine-grained ship classification tasks.

Graphical Abstract

1. Introduction

Advancements in remote sensing technology have expanded its applicability, yet image uniformity is influenced by system instabilities and environmental variables. In particular, ship imaging in remote sensing is significantly affected by changes in illumination and weather conditions [1].
In recent years, multi-style and arbitrary-style transfer learning has gained research attention. Dumoulin et al. achieved multi-style learning with conditional instance normalization (CIN) [2] and an improved texture network [3], but with limited styles due to training costs. Ye et al. [4] combined CIN with a convolutional attention module, enabling multi-style image transfer while preserving semantic information. Meanwhile, GDWCT [5] excelled in arbitrary style transfer by regularizing group-based style features, reducing computational expense. Also, LDA-GAN successfully transformed key object features for high-quality images [6]. Luan et al. introduced comprehensive style transfer to match texture and color, maintaining spatial consistency using a two-pass algorithm for seamless object synthesis [7]. Similar to this, RainNet [8] approached style coordination as a transfer problem, offering a region-aware normalization (RAIN) module. Finally, Jiang et al. [9] proposed SSH, a self-supervised coordination framework that bypasses manual user annotation and overcomes shortcomings in image synthesis and subsequent coordination tasks, facilitating the coordination of any photographic composite image.
Guo et al. [10] segmented the synthetic image into reflection and illumination parts, introducing an autoencoder designed to coordinate these elements individually. They believe that inconsistencies in reflectivity and lighting between foreground objects and the background can lead to a jarring appearance in the composite image. The lighting part is coordinated through migration technology, and the reflective part is coordinated through material consistency, thereby achieving coordination of the overall image. In the same year, Guo et al. [11] proposed the Disentangled-Harmonization Transformer (D-HT) framework, which exploits the context dependence of the Transformer to ensure structural and semantic stability while enhancing the lighting and background harmony of foreground objects, thereby making the synthesized image more realistic. In addition to the above algorithms, image coordination technology based on deep learning in recent years also includes pixel-to-pixel conversion [12,13]. This method facilitates dense pixel-to-pixel conversion on low-resolution images and is increasingly used for image synthesis and coordination, heralding new trends in this field.
Drawing from previous research, this paper adopts a CycleGAN-based unsupervised model for image-to-image conversion. It introduces a local-aware progressive method for transferring attributes between domains, enhancing the synthesis of remote sensing ship images with realistic and consistent features. The model’s effectiveness is confirmed through comparative studies and detailed classification experiments on remote sensing ship imagery [14,15].

2. Methods

2.1. Simulated Remote Sensing Ship Image Construction

In this study, the generated simulated ship remote sensing image mainly includes two aspects: the image foreground characterized by ship targets and the ocean remote sensing background. For the foreground part, this study uses 3Dmax 2022 modeling software to build high-fidelity three-dimensional models of various ship types, and then uses projection to achieve multi-pose imaging of these objects. For the background part, the ocean scenes in the “War Thunder” game show the dynamic undulations of waves, clearly visible, and realistic rendering effects. These scenes are similar to the spatial distribution characteristics of actual remote sensing images. Therefore, the study selected a perfect ocean image with an altitude of 1.5–2 km, a visibility distance of 20 km, and no ships in the Pacific battle scene in “War Thunder” as the background template. During the training of the U-Net model, network parameters are carefully fine-tuned via backpropagation to generate accurate segmentation masks aligned with the input image. Figure 1 is a flow chart of the initial stages of image data synthesis discussed in this Section. The 3D model is projected into 2D and cropped and collaged with the simulated remote sensing ocean background to generate the initial version of the simulated remote sensing ship image, and the U-Net [16] is used to generate an accurate segmentation mask corresponding to the input image, laying the foundation for the subsequent image migration and harmony tasks.

2.2. Data Augmentation Model Based on Transfer Learning

As shown in Figure 2, the remote sensing ship target fine-grained recognition data augmentation model presented in this paper consists of three main modules: the Simulated Image Generating (SIG) module, the Foreground Feature Translation Aligning (FFTA) module, and the Background Feature Translation Aligning (BFTA) module. Both the FFTA and BFTA modules are based on the Local-Aware Progressive Image Conversion Network, LA-CycleGAN.
The LA-CycleGAN framework comprises two generators G S T , G T S and two discriminators D S , D T . Specifically, G S T maps from the source to the target domain S T , while G T S reverses this process T S ; the discriminator D S discerns if an image originates from the source domain, while D T assesses its belonging to the target domain. Upon model convergence, the aim is to maximize and align the migration area distribution P ~ t a r g e t ( I ~ t ) in the output target image I t with its feature distribution, denoted as P t a r g e t ( I ~ t ) = P t a r g e t ( I t ) . Nevertheless, discriminator D T occasionally encounters difficulties in accurately identifying target domain images. A local perception strategy is thus invoked to alleviate blurring and augment the accurate capture of discrete regional features and textures, augmenting the exactness and detailed representation capabilities of image transference. This local-aware progressive transfer approach employs a binary mask M to preserve the invariant region R s _ i through element-wise multiplication with the source domain image. Post-migration feature mapping by the network, the unaltered area R1 is reapplied to the corresponding location in the generated image, ensuring R s _ i = R o _ i . This method neglects the transfer operation’s effects on the invariant region R s _ i , circumventing irrelevant feature interference and maintaining the autonomous integrity of R s _ i .
In our system, two principal components work in tandem to achieve feature transfer alignment: the background feature transfer alignment module and the foreground feature transfer alignment module. Each module is embedded with an LA-CycleGAN network, and they are configured to operate in a sequential manner. The primary objective is to synchronize the simulated image’s representational distribution with that of the authentic domain while ensuring the preservation of essential textural and structural attributes.
For the background component, we endeavored to enhance the variety of oceanic backdrop style representations by amassing over 1000 distinct remote sensing images of the ocean and harbors characterized by varied surface hues and wave patterns from multiple regions via Google Earth. During the background feature alignment phase, simulated remote sensing images define the source domain, whereas authentic oceanic backdrops form the target domain. The transitional area R s _ t corresponds to the expansive scenic background of the synthetic image, and the ship target R s _ i remains unchanged. Inputs for the background feature transfer alignment module encompass the genuine ocean background imagery, the synthetic remote sensing images of the source domain, and the image mask pertinent to the transitional area R s _ t . It is pivotal for the background feature transfer alignment module to preserve the constancy of the ship target while aligning the background features.
The foreground feature transfer alignment module receives as its input the ship target image from the real-world domain, an intermediate image representative of the intermediate domain, and the corresponding foreground mask depicting the ship target’s migration region. Within the realm of foreground alignment, the intermediate image serves as the source domain, and the real-world remote sensing ship image constitutes the target domain. Generator G S T facilitates the conversion from the intermediate to the real-world domain. During this phase, the ship target represents the migration area R s _ t , while the unchanged area R s _ i corresponds to the intermediate image’s background. Throughout this operation, the focus is placed exclusively on the ship target, as the ocean background is disregarded, and cycle consistency loss is computed solely for the ship target, effectively the foreground object.

2.3. Remote Sensing Ship Image Harmonization Algorithm

Figure 3 presents the overall architecture of our proposed remote sensing ship image harmonization algorithm, which employs a transfer fusion stratagem. Embracing the principles of adversarial learning, this algorithm dissects the network into four strategic components: the harmonization image generator, harmonization image discriminator, domain encoder, and domain discriminator. Specifically, the harmonization image generator, referred to as G H , is entrusted with generating the harmonized image I H from the provided input. The role of the harmonization image discriminator is to evaluate both the generator’s output and the actual remote sensing ship imagery, enabling adversarial learning through the propagation and updating of adversarial loss. The domain encoder, utilizing the produced harmonized image I H , the genuine remote sensing image, and relevant masks for both the foreground and background, generates four distinct categories of features. Subsequently, the domain discriminator identifies the correlation between the foreground and background features and communicates the corresponding adversarial loss. This results in the ultimate harmonization of the composite remote sensing ship image.
Inspired by the concept of U-Net, the generator framework proposed in this study employs a feature transfer fusion strategy and embraces a simple, symmetric encoder-decoder structure devoid of a feature normalization layer. The encoder module, when fed with remote sensing ship simulation pictures along with their corresponding image masks, processes them through a convolution layer followed by three fundamental blocks, each structured in a LeakyReLU-Conv-IN pattern. LeakyReLU, besides being instrumental in feature extraction through convolutional layers, exhibits superior resistance to saturation when compared to ReLU. This aids in mitigating the vanishing gradient problem and catalyzes network convergence. The method utilizes intra-normalization IN [17] to handle internal covariate shifts, thereby enhancing the model’s generalization capabilities. Owing to the symmetric structure, every fundamental block in the decoder section incorporates an Attention RAIN module. This facilitates the transfer of statistical data from background features to normalized foreground features, uninfluenced by foreground objects. It empowers the decoder to understand correlations between spatially varied features, thereby magnifying feature significance and minimizing informational loss or blurring. Lastly, through upsampling operations, the deconvolution layer reinstates the dimensionality of the feature map to match the original input image size, which not only restores detailed information but also promotes feature fusion.
The Attention RAIN module synergistically integrates RAIN with an attention mechanism to dynamically adjust ship foreground features while preserving background elements, thereby achieving complete style harmonization and facilitating holistic feature transfer and fusion within the image. In the encoder segment, it is possible to derive the feature map F e a t u r e C o m of the remote sensing composite image along with the mask M of the ship’s foreground target. Taking the Lth layer within the module as an instance, H L , W L , and C L symbolize the height, width, and channel count of features at layer L, respectively, with F e a t u r e C o m L denoting the remote sensing composite image’s feature map at layer L; and M L represents a mask of resized ship foreground objects at layer L. Initially, in the adaptive calibration phase for ship foreground features, the process diverges from RAIN’s approach of straightforward multiplication of input module’s feature F e a t u r e C o m L by its foreground and background masks followed by normalization via IN. Instead, this module uses partial convolution on resized foreground feature F e a t u r e S L and background feature F e a t u r e B L , using AdaIN to accomplish precise alignment and calibration between ship foreground features and remotely sensed ocean background features. Moreover, to counteract the potential for positional information loss induced by partial convolution in foreground and background features, a spatial attention mechanism is introduced. The technique involves processing the remote sensing ship synthetic image feature F e a t u r e C o m L through a 1 × 1 convolution kernel and activation function to derive the spatial attention weight w C o m S A . Subsequently, the weight w C o m S A is multiplied and combined with F e a t u r e C o m L to complete the fusion of ship foreground and remote sensing ocean background details. In the final stage, a 3 × 3 convolution kernel is used for information fusion and dimensionality reduction to generate the image feature F e a t u r e C o m L after feature splicing. This methodology significantly enhances the interconnectedness between foreground and background features, serving to augment feature extraction and dimensional reduction while conservatively retaining positional data.
F e a t u r e C o m L = C o n v ( w C o m S A F e a t u r e C o m L )
The remote sensing ship harmonization algorithm, grounded in the transfer fusion strategy, aims to utilize the generator G H to construct a harmonized image G H ( I C o m , M ) , given synthetic image I C o m and its corresponding binary mask M for the foreground. The loss function of this algorithm is tripartite:
Primarily, to acknowledge the domain discrepancy between the input remote sensing synthetic ship image and the actual remote sensing ship image, the global fusion loss function L g h is introduced to gradually approximate the input synthetic image I C o m to the real-world remote sensing image I R . Subsequently, the harmonization image discriminator D H adopts an adversarial learning approach. Its adversarial loss function, expressed as L a d v , inputs genuine remote sensing ship samples along with harmonized synthetic ship samples to ascertain the authenticity of the image. Concurrently, a domain discriminator D d o m is incorporated within the image harmonization operations to determine whether alignment in distribution is achieved between the foreground and background. Lastly, the algorithm invokes a corresponding loss function L v e r for the domain transfer procedure transitioning from the foreground to the background of the harmonized image sample.
L D H , D d o m , I R , I H , M = λ 1 L a d v D H , I R , I H + λ 2 L v e r D d o m , I R , I H , M   = λ 1 E I R m a x 0.1 D H I R + E I H m a x 0.1 + D H I H   + λ 2 E I R m a x 0.1 D d o m I R , M + E I H m a x 0.1 + D d o m I H , M
L G H , I C o m , M = λ 1 L a d v G H , I C o m , M + λ 2 L v e r G H , I C o m , M + λ 3 L g h G H , I C o m , M   = λ 1 E I C o m D H G H I C o m , M λ 2 E I C o m D D o m G H I C o m , M , M + λ 3   G H I C o m , M I R 1
Among them, λ 1 = λ 2 = 1 , λ 3 = 100 . Due to the unavailability of a public remote sensing ship IH dataset, this chapter harnesses the general benchmark synthetic dataset iHarmony4 [18] to train the image harmonization model. To differentiate images at varying processing stages, images produced by the remote sensing ship image harmonization algorithm, predicated on the transfer fusion strategy proposed in this Section, are designated as harmonized images. The algorithm creates a harmonious linkage between two independent operational phases: the background feature transfer alignment module and the foreground feature transfer alignment module. Adjusting features such as foreground brightness representation ensures that the entire aligned image visually resembles actual images optimally.

3. Results

3.1. Dataset

The FGSR-42 dataset [19], publicly available for fine-grained ship classification in remote sensing images, comprises approximately 9320 images across 42 categories. It aggregates data from Google Earth and key remote sensing repositories like DOTA, HRSC2016, and NWPUVHR-10, featuring diverse warships and civilian vessels. However, FGSR-42’s [19] long-tail distribution leads to varied model performance across its categories. For simplicity and direct comparison, this study narrows down to 12 ship types from our lab. Table 1 shows the actual images in our experimental dataset.

3.2. Experimental Environment

All experimentation conducted in this study was performed utilizing the Ubuntu 20.04 LTS operating system, bolstered by an NVIDIA GeForce GTX 2080 GPU equipped with 12 GB memory capacity. The software environment was powered by Pytorch-2.13.0 and CUDA 11.6.
In this chapter, eight exemplary fine-grained classification algorithms were selected to substantiate the robustness of the experimental conclusions. These include ResNet-101 [20], ResNext [21], DenseNet-121 [22], PyramidNet [23], Wide Residual Network (WRN) [24], ShuffleNet-v2 [25], EfficientNet-v2 [26], and Swin-Transformer [27]. Uniformity in experimental conditions was maintained by setting identical epochs and hyperparameters for training these models. Specifically, ResNet-101 underwent training for 200 epochs, while the remaining classification models were trained for 100 epochs each. Stochastic gradient descent (SGD) was employed during the training phase with a momentum of 0.9 and a weight decay parameter set to 0.0001.

3.3. Ablation Experiment

This Section divides the comprehensive model into a foreground feature transfer alignment module, a background feature transfer alignment module, and an image coordination module based on the transfer fusion strategy, and combines them to complete the ablation experiment. The efficacy of these modules was assessed both visually and in terms of algorithm performance.
Figure 4 presents illustrative examples from each pivotal phase. As the training cycles iterate within the foreground feature transfer alignment module, the stylistic features of the sea surface area in the simulated image progressively align with those of the actual domain. Importantly, this transformation preserves other dimensional aspects of the background, such as sea surface brightness variations, wave texture formations, and so forth. Subsequently, the target is transferred from the simulated domain to the real domain through training iterations within the background feature transfer alignment module.
The application of the image harmonization algorithm brings significant improvements to the previously discordant visual effects observed between the ship foreground and the ocean background within the respective foreground and background feature transfer alignment modules. This progress is vividly depicted in the fourth and fifth columns, from left to right, in Figure 4. The cumulative effects of each implemented stage serve to underscore the intricate features of the ship target, yielding a more cohesive and unified appearance across various regions. This enhancement in the richness of the visual portrayal of images augments the diversity of multi-dimensional information, thereby facilitating the elevation of accuracy in fine-grained ship classification.
Given that this study consists of a multi-stage, gradual feature alignment process, it is vital to scrupulously analyze each module’s impact. Consequently, the foreground feature transfer alignment module, background feature transfer alignment module, and transfer fusion harmonization module were deconstructed and reassembled in this Section. In this study, we established a baseline where the training data consisted solely of real images, and no data augmentation techniques were applied during the training process. Based on this baseline, we conducted experimental comparisons of the performance of various algorithms in the task of fine-grained recognition of remote sensing ships, with the results detailed in Table 2. Among them, Pyramid refers to PyramidNet [23], EffiN-v2 refers to EfficientNet-v2 [26], and Swin-T refers to Swin-Transformer [27]. In the table header, “HA” stands for Image Harmonization Algorithm. In the classification performance of five different classification algorithms (ResNet-101, ResNext, PyramidNet, EfficientNet-v2, and Swin-Transformer) using various module combinations, the transfer fusion harmonization module optimized the five models’ classification performance by an average of 3.04% (average classification accuracy for six types of ships) and 2.67% (average classification accuracy for twelve types of ships). The results presented above indicate that the transfer fusion and harmonization module, as proposed in this paper, positively impacts the task of fine-grained ship classification, leading to a significant performance enhancement of the model over that achieved by utilizing the background feature transfer alignment module alone. The image harmonization algorithm adeptly integrates the foreground and background within the synthetic image, paving the way for the extraction of distinctive features using a deep learning model. This significantly enhances both the scope and fidelity of how image information is depicted. The detailed evaluation of the classification effect of the algorithm shows that the remote sensing ship image coordination technology based on the transfer fusion method detailed in this chapter is both influential and beneficial. It markedly improves the interpretation of image content as well as the precision in the fine-grained classification of ships.

3.4. Hybrid Dataset Experiment

During the experimental process, it was discovered that datasets such as DOTA [28], NWPU VHR-10 [29], and DSCR [30] contain comparatively rare ship types, rendering them insufficient for fine-grained classification tasks. The FGSR-42 [19] dataset, due to its long-tail distribution, lacks authentic images in certain specific ship categories.
In this study, coordinated synthesized images were added in batches and incrementally to the training sets of ResNet-101 and ResNext, and they were divided into different experimental groups for experimentation under consistent parameter settings and testing conditions. Figure 5 shows the line graph of the average classification accuracy of six and twelve types of ships after supplementing with synthesized images of different scales, where K represents a multiple of the number of coordinated synthesized images and real images in the training set. The research results indicate that compared to other ratios, the performance of fine-grained ship classification models is optimal when the number of supplementary synthesized or simulated images is seven times that of the real images in the training set.
Therefore, this paper combines remote sensing ship harmonized images with real-world remote sensing ship imagery at a ratio of 7:1, enhancing the amalgamated remote sensing images with a series of high-precision category labels to create a new mixed dataset comprising 12 types of remote sensing ship images. Within this selection, the images of six ship types consist of composite data, specifically including Zumwalt-class destroyers, Charles de Gaulle aircraft carriers, Atago-class destroyers, Type 45 destroyers, Kuznetsov aircraft carriers, and Freedom class combat ships. The original images are sourced from the FGSCR-42 dataset. To address the scarcity of real images for these six ship types, 1092 high-quality harmonized images were created as supplementary samples. These were then amalgamated with real-world images to establish a comprehensive hybrid dataset. Further experiments are conducted based on this meticulously formulated hybrid dataset.
This Section evaluates the impact of this dataset on classification performance by conducting benchmark tests using seven common CNN classification networks and one Transformer network. These eight networks are ResNet-101, ResNext, DenseNet-121, PyramidNet, ShuffleNet-v2, WRN, EfficientNet-v2, and Swin-Transformer, with Table 3 listing the accuracy results of this dataset on each classification network. The results show that the mixed dataset proposed in this paper has higher accuracy in various classification algorithms.
Furthermore, to discern the detailed classification accuracy for each ship type within the dataset, this study employed the EfficientNet-v2 algorithm. Its performance in classifying all 12 types of ships is enumerated in Table 4.
As illustrated in Table 4, the classification accuracy for destroyers registers lower compared to that of aircraft carriers. Comparing the accuracy of different types of destroyers reveals that when the appearance of a destroyer is more similar to that of an aircraft carrier (such as the Atago-class destroyer), the corresponding classification accuracy tends to be higher. Moreover, ships that are larger in spatial dimensions and possess distinct shape attributes tend to achieve more favorable classification accuracy. This observation holds true across the accuracy results for various ship categories by other classification algorithms as well. The paper suggests that the smaller scale of ship targets makes their features relatively more difficult to discern; hence, it is more challenging to extract detailed feature information during the image conversion and harmonization process.

4. Discussion

The image conversion network and the remote sensing ship image harmonization algorithm advocated in this article are benchmarked against other conversion techniques rooted in deep learning, to discern the effectiveness of this algorithm. Both CycleGAN [16] and Sim2RealNet [1] are acclaimed for their embodiment of global perceptual feature learning and classic neural style transfer, respectively. Simultaneously, CUT [31] and SemI2I [32] embrace the vanguard in contrasting learning and image conversion within the domain of remote sensing image processing, respectively. Implementing the transformation of identical remote sensing ship simulation images, this algorithm yields harmonized images, the results of which are juxtaposed against those generated by these representative models, as depicted in Figure 6.
As demonstrated in the second and fourth columns from left to right in Figure 6, it is apparent that neither Sim2RealNet [1] nor SemI2I [31] are capable of entirely transforming the full stylistic appearance of the image. Additionally, given the ocean background occupies a significant portion of the image, the deep learning model tends to favor the style feature distribution of the ocean backdrop during the conversion process over the visual expression features of the ship target. CycleGAN [16] and CUT [31], which employ global perception feature alignment for image conversion and style transfer, manage to achieve a positive visual alignment in the background area of the image. However, interference from features in other areas can cause the images generated through global perception to appear deformed or distorted, as illustrated in the third and fifth columns from left to right in the figure. The method proposed in this paper significantly improves the discordant visual effects between ship foregrounds and ocean backgrounds within the foreground feature transfer alignment module and background feature transfer alignment module, as shown in the sixth column from left to right. This method highlights the detailed features of the ship targets while making the appearance across regions more harmonious. By enriching the vividness of the image presentation, the availability of multi-dimensional information is increased, thereby helping to enhance the accuracy of fine-grained ship classification.

5. Conclusions

Addressing ship classification challenges in remote sensing, this paper introduces a transfer learning-based data enhancement framework, which simulates ship images and employs an image migration and fusion model for cross-domain mapping. To harmonize the foreground–background discordance, a remote sensing ship image harmonization algorithm was developed. These techniques generated a rich dataset, improving fine-grained ship classification. Experimental results show our methodology outshines traditional data augmentation methods, indicating substantial classification accuracy boosts up to 14.89% across different algorithms. Moreover, through ablation studies, each component of our feature transfer fusion strategy substantially boosted performance, enhancing model accuracy by 3.04% for six ship types and 2.67% for twelve types, validating our local-aware progressive image conversion and harmonization approach.

Author Contributions

Conceptualization, Z.Z. and X.W. (Xianyun Wu); Data curation, J.Z.; Formal analysis, J.Z. and Z.Z.; Funding acquisition, X.W. (Xianyun Wu) and Y.L.; Investigation, Z.Z.; Methodology, Z.Z. and X.W. (Xianyun Wu); Project administration, X.W. (Xianyun Wu); Software, J.Z. and Z.Z.; Supervision, X.W. (Xianyun Wu); Validation, J.Z., X.W. (Xingzhuo Wei) and Y.L.; Visualization, X.W. (Xingzhuo Wei); Writing—original draft, J.Z., Z.Z. and X.W. (Xingzhuo Wei); Writing—review and editing, X.W. (Xianyun Wu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the China Postdoctoral Science Foundation (2013M540735); by the National Nature Science Foundation of China under Grant 61901388, 61301291, 61701360; by the 111 Project under Grant B08038; by the Shaanxi Provincial Science and Technology Innovation Team; by the Fundamental Research Funds for the Central Universities; by the Youth Innovation Team of Shaanxi Universities.

Data Availability Statement

Data are available at https://github.com/Phoebe30/IHMFSC (accessed on 15 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Xiao, Q.; Liu, B.; Li, Z.; Ni, W.; Yang, Z.; Li, L. Progressive data augmentation method for remote sensing ship image classification based on imaging simulation system and neural style transfer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9176–9186. [Google Scholar] [CrossRef]
  2. Dumoulin, V.; Shlens, J.; Kudlur, M. A Learned Representation for Artistic Style. arXiv 2016, arXiv:1610.07629. [Google Scholar] [CrossRef]
  3. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Improved texture networks: Maximizing quality and diversity in feed-forward styliza-tion and texture synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6924–6932. [Google Scholar]
  4. Ye, W.; Chen, Y.; Liu, Y.; Liu, C.; Zhou, H. Multi-style transfer and fusion of image’s regions based on attention mechanism and instance segmentation. Signal Process. Image Commun. A Publ. Eur. Assoc. Signal Process. 2023, 110, 116871. [Google Scholar] [CrossRef]
  5. Cho, W.; Choi, S.; Park, D.; Shin, I.; Choo, J. Image-to-Image Translation via Group-wise Deep Whitening and Coloring Transformation. arXiv 2018, arXiv:1812.09912. [Google Scholar] [CrossRef]
  6. Zhao, J.; Lee, F.; Hu, C.; Yu, H.; Chen, Q. LDA-GAN: Lightweight domain-attention GAN for unpaired image-to-image translation. Neurocomputing 2022, 506, 355–368. [Google Scholar] [CrossRef]
  7. Luan, F.; Paris, S.; Shechtman, E.; Bala, K. Deep Painterly Harmonization. Comput. Graph. Forum 2018, 37, 95–106. [Google Scholar] [CrossRef]
  8. Ling, J.; Xue, H.; Song, L.; Xie, R.; Gu, X. Region-aware Adaptive Instance Normalization for Image Harmonization. arXiv 2021, arXiv:2106.02853. [Google Scholar] [CrossRef]
  9. Jiang, Y.; Zhang, H.; Zhang, J.; Wang, Y.; Lin, Z.; Sunkavalli, K.; Chen, S.; Amirghodsi, S.; Kong, S.; Wang, Z. SSH: A Self-Supervised Framework for Image Harmonization. arXiv 2021, arXiv:2108.06805. [Google Scholar] [CrossRef]
  10. Guo, Z.; Zheng, H.; Jiang, Y.; Gu, Z.; Zheng, B. Intrinsic Image Harmonization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16362–16371. [Google Scholar]
  11. Guo, Z.; Guo, D.; Zheng, H.; Gu, Z.; Zheng, B.; Dong, J. Image Harmonization with Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 14850–14859. [Google Scholar]
  12. Cong, W.; Tao, X.; Niu, L.; Liang, J.; Gao, X.; Sun, Q.; Zhang, L. High-Resolution Image Harmonization via Collaborative Dual Transformations. arXiv 2021, arXiv:2109.06671. [Google Scholar] [CrossRef]
  13. Zhu, Z.; Zhang, Z.; Lin, Z.; Wu, R.; Chai, Z.; Guo, C.L. Image Harmonization by Matching Regional References. arXiv 2022, arXiv:2204.04715. [Google Scholar] [CrossRef]
  14. Zhang, Q.; Yuan, Q.; Song, M.; Yu, H.; Zhang, L. Cooperated spectral low-rankness prior and deep spatial prior for hsi unsupervised denoising. IEEE Trans. Image Process. 2022, 31, 6356–6368. [Google Scholar] [CrossRef] [PubMed]
  15. Zhang, Q.; Zheng, Y.; Yuan, Q.; Song, M.; Yu, H.; Xiao, Y. Hyperspectral image denoising: From model-driven, data-driven, to model-data-driven. IEEE Trans. Neural Netw. Learn. Syst. 2023; 1–21, in press. [Google Scholar] [CrossRef] [PubMed]
  16. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar] [CrossRef]
  17. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar] [CrossRef]
  18. Cong, W.; Zhang, J.; Niu, L.; Ling, Z.; Li, W.; Zhang, L. Image Harmonization Dataset iHarmony4: HCOCO, HAdobe5k, HFlickr, and Hday2night. arXiv 2019, arXiv:1908.10526. [Google Scholar]
  19. Di, Y.; Jiang, Z.; Zhang, H. A Public Dataset for Fine-Grained Ship Classification in Optical Remote Sensing Images. Remote Sens. 2021, 13, 747. [Google Scholar] [CrossRef]
  20. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  21. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks; IEEE: New York, NY, USA, 2016. [Google Scholar] [CrossRef]
  22. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
  23. Han, D.; Kim, J.; Kim, J. Deep Pyramidal Residual Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  24. Devries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar] [CrossRef]
  25. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
  26. Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. arXiv 2021, arXiv:2104.00298. [Google Scholar] [CrossRef]
  27. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin-Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
  28. Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-scale Dataset for Object Detection in Aerial Images. arXiv 2017, arXiv:1711.10398. [Google Scholar] [CrossRef]
  29. Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
  30. Di, Y.; Jiang, Z.; Zhang, H.; Meng, G. A public dataset for ship classification in remote sensing images. In Proceedings of the Image and Signal Processing for Remote Sensing XXV, Strasbourg, France, 9–11 September 2019; SPIE Remote Sensing. 2019; Volume 11155, pp. 515–521. [Google Scholar]
  31. Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.Y. Contrastive learning for unpaired image-to-image translation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IX 16. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 319–345. [Google Scholar]
  32. Tasar, O.; Happy, S.L.; Tarabalka, Y.; Alliez, P. SEMI2I: Semantically Consistent Image-to-Image Translation for Domain Adaptation of Remote Sensing Data. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020. [Google Scholar] [CrossRef]
Figure 1. Simulated imaging module architecture.
Figure 1. Simulated imaging module architecture.
Remotesensing 16 02192 g001
Figure 2. Framework of Data Augmentation Model.
Figure 2. Framework of Data Augmentation Model.
Remotesensing 16 02192 g002
Figure 3. Overall Framework of Remote Sensing Ship Image Harmonization Algorithm.
Figure 3. Overall Framework of Remote Sensing Ship Image Harmonization Algorithm.
Remotesensing 16 02192 g003
Figure 4. Results of Ablation Studies Across Different Modules.
Figure 4. Results of Ablation Studies Across Different Modules.
Remotesensing 16 02192 g004
Figure 5. Line Graph of Classification Accuracy for ResNet and ResNext After Supplementation with Synthetic Images at Various Ratios.
Figure 5. Line Graph of Classification Accuracy for ResNet and ResNext After Supplementation with Synthetic Images at Various Ratios.
Remotesensing 16 02192 g005
Figure 6. Visualization of Comparative Experiments for Image Conversion Methods.
Figure 6. Visualization of Comparative Experiments for Image Conversion Methods.
Remotesensing 16 02192 g006
Table 1. Experimental Dataset.
Table 1. Experimental Dataset.
Ship CategoryDetailed NameInclusion of Generated SamplesTraining Set SizeTest Set Size
Aircraft_
carrier
Charles_de_Gaulle_aircraft_carrierYes3434
Kuznetsov-class_aircraft_carrierYes3434
Nimitz-class_aircraft_carrierNo388165
Midway-class aircraft_carrierNo14662
Landing_shipWhitby_island-class_dock_landing_shipNo19583
DestroyerArleigh_Burke-class_destroyerNo407174
Atago-class_destroyerYes3535
Murasame-class_destroyerNo407174
Type_45_destroyerYes11248
Zumwalt-class-destroyerYes2525
Combat_shipIndependence-class_combat_shipNo14862
Freedom-class_combat_shipYes12353
Table 2. Ablation Experiments for Fine-Grained Ship Classification Across Different Modules.
Table 2. Ablation Experiments for Fine-Grained Ship Classification Across Different Modules.
Classification ARBaselineSIG+BFTA+BFTA
+FFTA
+BFTA
+FFTA
+HA
BFTA GainFFTA GainHA Gain
A R ¯ 6 c l a s s ResNet68.6076.9879.0182.8285.972.033.813.15
ResNext74.6479.6676.3482.0985.55−3.325.753.46
Pyramid76.5778.4781.6485.0387.483.173.392.45
EffiN-v283.6886.1687.2188.8791.681.051.662.81
Swin-T87.3288.1487.3691.4994.88−0.784.133.39
A R ¯ 12 c l a s s ResNet79.4484.7386.9889.192.042.252.122.94
ResNext83.9584.9185.5789.0591.280.663.482.23
Pyramid84.4885.3387.1789.2692.391.842.093.13
EffiN-v289.1090.5690.7192.394.810.151.592.51
Swin-T91.5392.9992.2294.4997.11−0.772.272.62
Table 3. Accuracy of Multiclass Classification Algorithms on Mixed Datasets.
Table 3. Accuracy of Multiclass Classification Algorithms on Mixed Datasets.
AR
(%)
RN-110ResNextDenseNetPyramidNetWRNShuffleNet-v2EfficientNet-v2Swin-T
A R ¯ 6 c l a s s 86.1986.1783.4887.3292.6590.5193.0395.13
A R ¯ 12 c l a s s 92.0192.2587.9693.0595.8784.9195.9797.51
Table 4. Classification Accuracy for 12 Types of Ships in EfficientNet-v2.
Table 4. Classification Accuracy for 12 Types of Ships in EfficientNet-v2.
Ship CategoriesAR (%)
Charles_de_Gaulle_aircraft_carrier97.12
Kuznetsov-class_aircraft_carrier100.00
Atago-class_destroyer93.53
Type_45_destroyer75.06
Zumwalt-class-destroyer96.21
Freedom-class_combat_ship97.14
Nimitz-class_aircraft_carrier99.87
Midway-class aircraft_carrier100.00
Whitby_island-class_dock_landing_ship98.75
Arleigh_Burke-class_destroyer97.38
Murasame-class_destroyer97.83
Independence-class_combat_ship98.84
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Zhong, Z.; Wei, X.; Wu, X.; Li, Y. Remote Sensing Image Harmonization Method for Fine-Grained Ship Classification. Remote Sens. 2024, 16, 2192. https://doi.org/10.3390/rs16122192

AMA Style

Zhang J, Zhong Z, Wei X, Wu X, Li Y. Remote Sensing Image Harmonization Method for Fine-Grained Ship Classification. Remote Sensing. 2024; 16(12):2192. https://doi.org/10.3390/rs16122192

Chicago/Turabian Style

Zhang, Jingpu, Ziyan Zhong, Xingzhuo Wei, Xianyun Wu, and Yunsong Li. 2024. "Remote Sensing Image Harmonization Method for Fine-Grained Ship Classification" Remote Sensing 16, no. 12: 2192. https://doi.org/10.3390/rs16122192

APA Style

Zhang, J., Zhong, Z., Wei, X., Wu, X., & Li, Y. (2024). Remote Sensing Image Harmonization Method for Fine-Grained Ship Classification. Remote Sensing, 16(12), 2192. https://doi.org/10.3390/rs16122192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop