Remote Sensing Image Harmonization Method for Fine-Grained Ship Classification

: Target recognition and fine-grained ship classification in remote sensing face challenges of high inter-class similarity and sample scarcity. A transfer fusion-based ship image harmonization algorithm is proposed to overcome these challenges. This algorithm designs a feature transfer fusion strategy based on the combination of a region-aware instantiation and attention mechanism. Adversarial learning is implemented through an image harmony generator and discriminator module to generate realistic remote sensing ship harmony images. Furthermore, the domain encoder and domain discriminator modules are responsible for extracting feature representations of the foreground and background, and further align the ship foreground with remote sensing ocean background features through feature discrimination. Compared with other advanced image conversion techniques, our algorithm delivers more realistic visuals, improving classification accuracy for six ship types by 3% and twelve types by 2.94%, outperforming Sim2RealNet. Finally, a mixed dataset containing data augmentation and harmonizing samples and real data was proposed for the fine-grained classification task of remote sensing ships. Evaluation experiments were conducted on eight typical fine-grained classification algorithms, and the accuracy of the fine-grained classification for all categories of ships was analyzed. The experimental results show that the mixed dataset proposed in this paper effectively alleviates the long-tail problem in real datasets, and the proposed remote sensing ship data augmentation framework performs better than state-of-the-art data augmentation methods in fine-grained ship classification tasks


Introduction
Advancements in remote sensing technology have expanded its applicability, yet image uniformity is influenced by system instabilities and environmental variables.In particular, ship imaging in remote sensing is significantly affected by changes in illumination and weather conditions [1].
In recent years, multi-style and arbitrary-style transfer learning has gained research attention.Dumoulin et al. achieved multi-style learning with conditional instance normalization (CIN) [2] and an improved texture network [3], but with limited styles due to training costs.Ye et al. [4] combined CIN with a convolutional attention module, enabling multi-style image transfer while preserving semantic information.Meanwhile, GDWCT [5] excelled in arbitrary style transfer by regularizing group-based style features, reducing computational expense.Also, LDA-GAN successfully transformed key object features for high-quality images [6].Luan et al. introduced comprehensive style transfer to match texture and color, maintaining spatial consistency using a two-pass algorithm for seamless object synthesis [7].Similar to this, RainNet [8] approached style coordination as a transfer problem, offering a region-aware normalization (RAIN) module.Finally, Jiang et al. [9] proposed SSH, a self-supervised coordination framework that bypasses manual user annotation and overcomes shortcomings in image synthesis and subsequent coordination tasks, facilitating the coordination of any photographic composite image.
Guo et al. [10] segmented the synthetic image into reflection and illumination parts, introducing an autoencoder designed to coordinate these elements individually.They believe that inconsistencies in reflectivity and lighting between foreground objects and the background can lead to a jarring appearance in the composite image.The lighting part is coordinated through migration technology, and the reflective part is coordinated through material consistency, thereby achieving coordination of the overall image.In the same year, Guo et al. [11] proposed the Disentangled-Harmonization Transformer (D-HT) framework, which exploits the context dependence of the Transformer to ensure structural and semantic stability while enhancing the lighting and background harmony of foreground objects, thereby making the synthesized image more realistic.In addition to the above algorithms, image coordination technology based on deep learning in recent years also includes pixel-to-pixel conversion [12,13].This method facilitates dense pixel-to-pixel conversion on low-resolution images and is increasingly used for image synthesis and coordination, heralding new trends in this field.
Drawing from previous research, this paper adopts a CycleGAN-based unsupervised model for image-to-image conversion.It introduces a local-aware progressive method for transferring attributes between domains, enhancing the synthesis of remote sensing ship images with realistic and consistent features.The model's effectiveness is confirmed through comparative studies and detailed classification experiments on remote sensing ship imagery [14,15].

Simulated Remote Sensing Ship Image Construction
In this study, the generated simulated ship remote sensing image mainly includes two aspects: the image foreground characterized by ship targets and the ocean remote sensing background.For the foreground part, this study uses 3Dmax 2022 modeling software to build high-fidelity three-dimensional models of various ship types, and then uses projection to achieve multi-pose imaging of these objects.For the background part, the ocean scenes in the "War Thunder" game show the dynamic undulations of waves, clearly visible, and realistic rendering effects.These scenes are similar to the spatial distribution characteristics of actual remote sensing images.Therefore, the study selected a perfect ocean image with an altitude of 1.5-2 km, a visibility distance of 20 km, and no ships in the Pacific battle scene in "War Thunder" as the background template.During the training of the U-Net model, network parameters are carefully fine-tuned via backpropagation to generate accurate segmentation masks aligned with the input image.Figure 1 is a flow chart of the initial stages of image data synthesis discussed in this Section.The 3D model is projected into 2D and cropped and collaged with the simulated remote sensing ocean background to generate the initial version of the simulated remote sensing ship image, and the U-Net [16] is used to generate an accurate segmentation mask corresponding to the input image, laying the foundation for the subsequent image migration and harmony tasks.Remote Sens. 2024, 16, x FOR PEER REVIEW 3 of 14

Data Augmentation Model Based on Transfer Learning
As shown in Figure 2

Data Augmentation Model Based on Transfer Learning
As shown in Figure 2

Data Augmentation Model Based on Transfer Learning
As shown in Figure 2   The LA-CycleGAN framework comprises two generators (G S→T , G T→S ) and two dis- criminators (D S , D T ).Specifically, G S→T maps from the source to the target domain (S → T), while G T→S reverses this process (T → S); the discriminator D S discerns if an image originates from the source domain, while D T assesses its belonging to the target domain.Upon model convergence, the aim is to maximize and align the migration area distribution ∼ P target ( ∼ I t ) in the output target image I t with its feature distribution, denoted as P target ( ∼ I t ) = P target (I t ).Nevertheless, discriminator D T occasionally encounters diffi- culties in accurately identifying target domain images.A local perception strategy is thus invoked to alleviate blurring and augment the accurate capture of discrete regional features and textures, augmenting the exactness and detailed representation capabilities of image transference.This local-aware progressive transfer approach employs a binary mask M to preserve the invariant region R s_i through element-wise multiplication with the source domain image.Post-migration feature mapping by the network, the unaltered area R1 is reapplied to the corresponding location in the generated image, ensuring R s_i = R o_i .This method neglects the transfer operation's effects on the invariant region R s_i , circumventing irrelevant feature interference and maintaining the autonomous integrity of R s_i .
In our system, two principal components work in tandem to achieve feature transfer alignment: the background feature transfer alignment module and the foreground feature transfer alignment module.Each module is embedded with an LA-CycleGAN network, and they are configured to operate in a sequential manner.The primary objective is to synchronize the simulated image's representational distribution with that of the authentic domain while ensuring the preservation of essential textural and structural attributes.
For the background component, we endeavored to enhance the variety of oceanic backdrop style representations by amassing over 1000 distinct remote sensing images of the ocean and harbors characterized by varied surface hues and wave patterns from multiple regions via Google Earth.During the background feature alignment phase, simulated remote sensing images define the source domain, whereas authentic oceanic backdrops form the target domain.The transitional area R s_t corresponds to the expansive scenic background of the synthetic image, and the ship target R s_i remains unchanged.Inputs for the background feature transfer alignment module encompass the genuine ocean background imagery, the synthetic remote sensing images of the source domain, and the image mask pertinent to the transitional area R s_t .It is pivotal for the background feature transfer alignment module to preserve the constancy of the ship target while aligning the background features.
The foreground feature transfer alignment module receives as its input the ship target image from the real-world domain, an intermediate image representative of the intermediate domain, and the corresponding foreground mask depicting the ship target's migration region.Within the realm of foreground alignment, the intermediate image serves as the source domain, and the real-world remote sensing ship image constitutes the target domain.Generator G S→T facilitates the conversion from the intermediate to the real-world domain.During this phase, the ship target represents the migration area R s_t , while the unchanged area R s_i corresponds to the intermediate image's background.Throughout this operation, the focus is placed exclusively on the ship target, as the ocean background is disregarded, and cycle consistency loss is computed solely for the ship target, effectively the foreground object.

Remote Sensing Ship Image Harmonization Algorithm
Figure 3 presents the overall architecture of our proposed remote sensing ship image harmonization algorithm, which employs a transfer fusion stratagem.Embracing the principles of adversarial learning, this algorithm dissects the network into four strategic components: the harmonization image generator, harmonization image discriminator, domain encoder, and domain discriminator.Specifically, the harmonization image generator, referred to as G H , is entrusted with generating the harmonized image I H from the provided input.The role of the harmonization image discriminator is to evaluate both the generator's output and the actual remote sensing ship imagery, enabling adversarial learning through the propagation and updating of adversarial loss.The domain encoder, utilizing the produced harmonized image I H , the genuine remote sensing image, and relevant masks for both the foreground and background, generates four distinct categories of features.Subsequently, the domain discriminator identifies the correlation between the foreground and background features and communicates the corresponding adversarial loss.This results in the ultimate harmonization of the composite remote sensing ship image.
Remote Sens. 2024, 16, x FOR PEER REVIEW 5 of 14 provided input.The role of the harmonization image discriminator is to evaluate both the generator's output and the actual remote sensing ship imagery, enabling adversarial learning through the propagation and updating of adversarial loss.The domain encoder, utilizing the produced harmonized image  , the genuine remote sensing image, and relevant masks for both the foreground and background, generates four distinct categories of features.Subsequently, the domain discriminator identifies the correlation between the foreground and background features and communicates the corresponding adversarial loss.This results in the ultimate harmonization of the composite remote sensing ship image.Inspired by the concept of U-Net, the generator framework proposed in this study employs a feature transfer fusion strategy and embraces a simple, symmetric encoderdecoder structure devoid of a feature normalization layer.The encoder module, when fed with remote sensing ship simulation pictures along with their corresponding image masks, processes them through a convolution layer followed by three fundamental blocks, each structured in a LeakyReLU-Conv-IN pattern.LeakyReLU, besides being instrumental in feature extraction through convolutional layers, exhibits superior resistance to saturation when compared to ReLU.This aids in mitigating the vanishing gradient problem and catalyzes network convergence.The method utilizes intra-normalization IN [17] to handle internal covariate shifts, thereby enhancing the model's generalization capabilities.Owing to the symmetric structure, every fundamental block in the decoder section incorporates an Attention RAIN module.This facilitates the transfer of statistical data from background features to normalized foreground features, uninfluenced by foreground objects.It empowers the decoder to understand correlations between spatially varied features, thereby magnifying feature significance and minimizing informational loss or blurring.Lastly, through upsampling operations, the deconvolution layer reinstates the dimensionality of the feature map to match the original input image size, which not only restores detailed information but also promotes feature fusion.
The Attention RAIN module synergistically integrates RAIN with an attention mechanism to dynamically adjust ship foreground features while preserving background elements, thereby achieving complete style harmonization and facilitating holistic feature transfer and fusion within the image.In the encoder segment, it is possible to derive the feature map  of the remote sensing composite image along with the mask M of the ship's foreground target.Taking the Lth layer within the module as an instance,  , Inspired by the concept of U-Net, the generator framework proposed in this study employs a feature transfer fusion strategy and embraces a simple, symmetric encoderdecoder structure devoid of a feature normalization layer.The encoder module, when fed with remote sensing ship simulation pictures along with their corresponding image masks, processes them through a convolution layer followed by three fundamental blocks, each structured in a LeakyReLU-Conv-IN pattern.LeakyReLU, besides being instrumental in feature extraction through convolutional layers, exhibits superior resistance to saturation when compared to ReLU.This aids in mitigating the vanishing gradient problem and catalyzes network convergence.The method utilizes intra-normalization IN [17] to handle internal covariate shifts, thereby enhancing the model's generalization capabilities.Owing to the symmetric structure, every fundamental block in the decoder section incorporates an Attention RAIN module.This facilitates the transfer of statistical data from background features to normalized foreground features, uninfluenced by foreground objects.It empowers the decoder to understand correlations between spatially varied features, thereby magnifying feature significance and minimizing informational loss or blurring.Lastly, through upsampling operations, the deconvolution layer reinstates the dimensionality of the feature map to match the original input image size, which not only restores detailed information but also promotes feature fusion.
The Attention RAIN module synergistically integrates RAIN with an attention mechanism to dynamically adjust ship foreground features while preserving background elements, thereby achieving complete style harmonization and facilitating holistic feature transfer and fusion within the image.In the encoder segment, it is possible to derive the feature map Feature Com of the remote sensing composite image along with the mask M of the ship's foreground target.Taking the Lth layer within the module as an instance, H L , W L , and C L symbolize the height, width, and channel count of features at layer L, respectively, with Feature L Com denoting the remote sensing composite image's feature map at layer L; and M L represents a mask of resized ship foreground objects at layer L. Initially, in the adaptive calibration phase for ship foreground features, the process diverges from RAIN's approach of straightforward multiplication of input module's feature Feature L Com by its foreground and background masks followed by normalization via IN.Instead, this module uses partial convolution on resized foreground feature Feature L S and background feature Feature L B , using AdaIN to accomplish precise alignment and calibration between ship foreground features and remotely sensed ocean background features.Moreover, to counteract the potential for positional information loss induced by partial convolution in foreground and background features, a spatial attention mechanism is introduced.The technique involves processing the remote sensing ship synthetic image feature Feature L Com through a 1 × 1 convolution kernel and activation function to derive the spatial attention weight w SA Com .Subsequently, the weight w SA Com is multiplied and combined with Feature L Com to complete the fusion of ship foreground and remote sensing ocean background details.
In the final stage, a 3 × 3 convolution kernel is used for information fusion and dimensionality reduction to generate the image feature Feature L Com after feature splicing.This methodology significantly enhances the interconnectedness between foreground and background features, serving to augment feature extraction and dimensional reduction while conservatively retaining positional data.
The remote sensing ship harmonization algorithm, grounded in the transfer fusion strategy, aims to utilize the generator G H to construct a harmonized image G H (I Com , M), given synthetic image I Com and its corresponding binary mask M for the foreground.The loss function of this algorithm is tripartite: Primarily, to acknowledge the domain discrepancy between the input remote sensing synthetic ship image and the actual remote sensing ship image, the global fusion loss function L gh is introduced to gradually approximate the input synthetic image I Com to the real-world remote sensing image I R .Subsequently, the harmonization image discriminator D H adopts an adversarial learning approach.Its adversarial loss function, expressed as L adv , inputs genuine remote sensing ship samples along with harmonized synthetic ship samples to ascertain the authenticity of the image.Concurrently, a domain discriminator D dom is incorporated within the image harmonization operations to determine whether alignment in distribution is achieved between the foreground and background.Lastly, the algorithm invokes a corresponding loss function L ver for the domain transfer procedure transitioning from the foreground to the background of the harmonized image sample. (2) Among them, λ 1 = λ 2 = 1, λ 3 = 100.Due to the unavailability of a public remote sensing ship IH dataset, this chapter harnesses the general benchmark synthetic dataset iHarmony4 [18] to train the image harmonization model.To differentiate images at varying processing stages, images produced by the remote sensing ship image harmonization algorithm, predicated on the transfer fusion strategy proposed in this Section, are designated as harmonized images.The algorithm creates a harmonious linkage between two independent operational phases: the background feature transfer alignment module and the foreground feature transfer alignment module.Adjusting features such as foreground brightness representation ensures that the entire aligned image visually resembles actual images optimally.

Dataset
The FGSR-42 dataset [19], publicly available for fine-grained ship classification in remote sensing images, comprises approximately 9320 images across 42 categories.It aggregates data from Google Earth and key remote sensing repositories like DOTA, HRSC2016, and NWPUVHR-10, featuring diverse warships and civilian vessels.However, FGSR-42's [19] long-tail distribution leads to varied model performance across its categories.For simplicity and direct comparison, this study narrows down to 12 ship types from our lab.Table 1 shows the actual images in our experimental dataset.

Experimental Environment
All experimentation conducted in this study was performed utilizing the Ubuntu 20.04 LTS operating system, bolstered by an NVIDIA GeForce GTX 2080 GPU equipped with 12 GB memory capacity.The software environment was powered by Pytorch-2.13.0 and CUDA 11.6.

Ablation Experiment
This Section divides the comprehensive model into a foreground feature transfer alignment module, a background feature transfer alignment module, and an image coordination module based on the transfer fusion strategy, and combines them to complete the ablation experiment.The efficacy of these modules was assessed both visually and in terms of algorithm performance.
Figure 4 presents illustrative examples from each pivotal phase.As the training cycles iterate within the foreground feature transfer alignment module, the stylistic features of the sea surface area in the simulated image progressively align with those of the actual domain.Importantly, this transformation preserves other dimensional aspects of the background, such as sea surface brightness variations, wave texture formations, and so forth.Subsequently, the target is transferred from the simulated domain to the real domain through training iterations within the background feature transfer alignment module.
Figure 4 presents illustrative examples from each pivotal phase.As the training cycles iterate within the foreground feature transfer alignment module, the stylistic features of the sea surface area in the simulated image progressively align with those of the actual domain.Importantly, this transformation preserves other dimensional aspects of the background, such as sea surface brightness variations, wave texture formations, and so forth.Subsequently, the target is transferred from the simulated domain to the real domain through training iterations within the background feature transfer alignment module.The application of the image harmonization algorithm brings significant improvements to the previously discordant visual effects observed between the ship foreground and the ocean background within the respective foreground and background feature transfer alignment modules.This progress is vividly depicted in the fourth and fifth columns, from left to right, in Figure 4.The cumulative effects of each implemented stage serve to underscore the intricate features of the ship target, yielding a more cohesive and unified appearance across various regions.This enhancement in the richness of the visual portrayal of images augments the diversity of multi-dimensional information, thereby facilitating the elevation of accuracy in fine-grained ship classification.
Given that this study consists of a multi-stage, gradual feature alignment process, it is vital to scrupulously analyze each module's impact.Consequently, the foreground feature transfer alignment module, background feature transfer alignment module, and transfer fusion harmonization module were deconstructed and reassembled in this Section.In this study, we established a baseline where the training data consisted solely of real images, and no data augmentation techniques were applied during the training process.Based on this baseline, we conducted experimental comparisons of the performance of various algorithms in the task of fine-grained recognition of remote sensing ships, with the results detailed in Table 2.Among them, Pyramid refers to PyramidNet [23], EffiN-v2 refers to EfficientNet-v2 [26], and Swin-T refers to Swin-Transformer [27].In the table header, "HA" stands for Image Harmonization Algorithm.In the classification performance of five different classification algorithms (ResNet-101, ResNext, PyramidNet, Effi-cientNet-v2, and Swin-Transformer) using various module combinations, the transfer fusion harmonization module optimized the five models' classification performance by an average of 3.04% (average classification accuracy for six types of ships) and 2.67% (average The application of the image harmonization algorithm brings significant improvements to the previously discordant visual effects observed between the ship foreground and the ocean background within the respective foreground and background feature transfer alignment modules.This progress is vividly depicted in the fourth and fifth columns, from left to right, in Figure 4.The cumulative effects of each implemented stage serve to underscore the intricate features of the ship target, yielding a more cohesive and unified appearance across various regions.This enhancement in the richness of the visual portrayal of images augments the diversity of multi-dimensional information, thereby facilitating the elevation of accuracy in fine-grained ship classification. Given that this study consists of a multi-stage, gradual feature alignment process, it is vital to scrupulously analyze each module's impact.Consequently, the foreground feature transfer alignment module, background feature transfer alignment module, and transfer fusion harmonization module were deconstructed and reassembled in this Section.In this study, we established a baseline where the training data consisted solely of real images, and no data augmentation techniques were applied during the training process.Based on this baseline, we conducted experimental comparisons of the performance of various algorithms in the task of fine-grained recognition of remote sensing ships, with the results detailed in Table 2.Among them, Pyramid refers to PyramidNet [23], EffiN-v2 refers to EfficientNet-v2 [26], and Swin-T refers to Swin-Transformer [27].In the table header, "HA" stands for Image Harmonization Algorithm.In the classification performance of five different classification algorithms (ResNet-101, ResNext, PyramidNet, EfficientNet-v2, and Swin-Transformer) using various module combinations, the transfer fusion harmonization module optimized the five models' classification performance by an average of 3.04% (average classification accuracy for six types of ships) and 2.67% (average classification accuracy for twelve types of ships).The results presented above indicate that the transfer fusion and harmonization module, as proposed in this paper, positively impacts the task of fine-grained ship classification, leading to a significant performance enhancement of the model over that achieved by utilizing the background feature transfer alignment module alone.The image harmonization algorithm adeptly integrates the foreground and background within the synthetic image, paving the way for the extraction of distinctive features using a deep learning model.This significantly enhances both the scope and fidelity of how image information is depicted.The detailed evaluation of the classification effect of the algorithm shows that the remote sensing ship image coordination technology based on the transfer fusion method detailed in this chapter is both influential and beneficial.It markedly improves the interpretation of image content as well as the precision in the fine-grained classification of ships.

Hybrid Dataset Experiment
During the experimental process, it was discovered that datasets such as DOTA [28], NWPU VHR-10 [29], and DSCR [30] contain comparatively rare ship types, rendering them insufficient for fine-grained classification tasks.The FGSR-42 [19] dataset, due to its long-tail distribution, lacks authentic images in certain specific ship categories.
In this study, coordinated synthesized images were added in batches and incrementally to the training sets of ResNet-101 and ResNext, and they were divided into different experimental groups for experimentation under consistent parameter settings and testing conditions.Figure 5 shows the line graph of the average classification accuracy of six and twelve types of ships after supplementing with synthesized images of different scales, where K represents a multiple of the number of coordinated synthesized images and real images in the training set.The research results indicate that compared to other ratios, the performance of fine-grained ship classification models is optimal when the number of supplementary synthesized or simulated images is seven times that of the real images in the training set.Therefore, this paper combines remote sensing ship harmonized images with realworld remote sensing ship imagery at a ratio of 7:1, enhancing the amalgamated remote sensing images with a series of high-precision category labels to create a new mixed dataset comprising 12 types of remote sensing ship images.Within this selection, the images of six ship types consist of composite data, specifically including Zumwalt-class destroy- Therefore, this paper combines remote sensing ship harmonized images with realworld remote sensing ship imagery at a ratio of 7:1, enhancing the amalgamated remote sensing images with a series of high-precision category labels to create a new mixed dataset comprising 12 types of remote sensing ship images.Within this selection, the images of six ship types consist of composite data, specifically including Zumwalt-class destroyers, Charles de Gaulle aircraft carriers, Atago-class destroyers, Type 45 destroyers, Kuznetsov aircraft carriers, and Freedom class combat ships.The original images are sourced from the FGSCR-42 dataset.To address the scarcity of real images for these six ship types, 1092 high-quality harmonized images were created as supplementary samples.These were then amalgamated with real-world images to establish a comprehensive hybrid dataset.Further experiments are conducted based on this meticulously formulated hybrid dataset.
This Section evaluates the impact of this dataset on classification performance by conducting benchmark tests using seven common CNN classification networks and one Transformer network.These eight networks are ResNet-101, ResNext, DenseNet-121, PyramidNet, ShuffleNet-v2, WRN, EfficientNet-v2, and Swin-Transformer, with Table 3 listing the accuracy results of this dataset on each classification network.The results show that the mixed dataset proposed in this paper has higher accuracy in various classification algorithms.Furthermore, to discern the detailed classification accuracy for each ship type within the dataset, this study employed the EfficientNet-v2 algorithm.Its performance in classifying all 12 types of ships is enumerated in Table 4.As illustrated in Table 4, the classification accuracy for destroyers registers lower compared to that of aircraft carriers.Comparing the accuracy of different types of destroyers reveals that when the appearance of a destroyer is more similar to that of an aircraft carrier (such as the Atago-class destroyer), the corresponding classification accuracy tends to be higher.Moreover, ships that are larger in spatial dimensions and possess distinct shape attributes tend to achieve more favorable classification accuracy.This observation holds true across the accuracy results for various ship categories by other classification algorithms as well.The paper suggests that the smaller scale of ship targets makes their features relatively more difficult to discern; hence, it is more challenging to extract detailed feature information during the image conversion and harmonization process.

Discussion
The image conversion network and the remote sensing ship image harmonization algorithm advocated in this article are benchmarked against other conversion techniques rooted in deep learning, to discern the effectiveness of this algorithm.Both CycleGAN [16] and Sim2RealNet [1] are acclaimed for their embodiment of global perceptual feature learning and classic neural style transfer, respectively.Simultaneously, CUT [31] and SemI2I [32] embrace the vanguard in contrasting learning and image conversion within the domain of remote sensing image processing, respectively.Implementing the transformation of identical remote sensing ship simulation images, this algorithm yields harmonized images, the results of which are juxtaposed against those generated by these representative models, as depicted in Figure 6.As demonstrated in the second and fourth columns from left to right in Figure 6, it is apparent that neither Sim2RealNet [1] nor SemI2I [31] are capable of entirely transforming the full stylistic appearance of the image.Additionally, given the ocean background occupies a significant portion of the image, the deep learning model tends to favor the style feature distribution of the ocean backdrop during the conversion process over the visual expression features of the ship target.CycleGAN [16] and CUT [31], which employ global perception feature alignment for image conversion and style transfer, manage to achieve a positive visual alignment in the background area of the image.However, interference from features in other areas can cause the images generated through global perception to appear deformed or distorted, as illustrated in the third and fifth columns from left to right in the figure.The method proposed in this paper significantly improves the discordant visual effects between ship foregrounds and ocean backgrounds within the foreground feature transfer alignment module and background feature transfer alignment module, as shown in the sixth column from left to right.This method highlights the detailed features of the ship targets while making the appearance across regions more harmonious.By enriching the vividness of the image presentation, the availability of multidimensional information is increased, thereby helping to enhance the accuracy of finegrained ship classification.

Conclusions
Addressing ship classification challenges in remote sensing, this paper introduces a transfer learning-based data enhancement framework, which simulates ship images and employs an image migration and fusion model for cross-domain mapping.To harmonize the foreground-background discordance, a remote sensing ship image harmonization algorithm was developed.These techniques generated a rich dataset, improving finegrained ship classification.Experimental results show our methodology outshines traditional data augmentation methods, indicating substantial classification accuracy boosts up to 14.89% across different algorithms.Moreover, through ablation studies, each com- As demonstrated in the second and fourth columns from left to right in Figure 6, it is apparent that neither Sim2RealNet [1] nor SemI2I [31] are capable of entirely transforming the full stylistic appearance of the image.Additionally, given the ocean background occupies a significant portion of the image, the deep learning model tends to favor the style feature distribution of the ocean backdrop during the conversion process over the visual expression features of the ship target.CycleGAN [16] and CUT [31], which employ global perception feature alignment for image conversion and style transfer, manage to achieve a positive visual alignment in the background area of the image.However, interference from features in other areas can cause the images generated through global perception to appear deformed or distorted, as illustrated in the third and fifth columns from left to right in the figure.The method proposed in this paper significantly improves the discordant visual effects between ship foregrounds and ocean backgrounds within the foreground feature transfer alignment module and background feature transfer alignment module, as shown in the sixth column from left to right.This method highlights the detailed features of the ship targets while making the appearance across regions more harmonious.By enriching the vividness of the image presentation, the availability of multi-dimensional information is increased, thereby helping to enhance the accuracy of fine-grained ship classification.

Conclusions
Addressing ship classification challenges in remote sensing, this paper introduces a transfer learning-based data enhancement framework, which simulates ship images and employs an image migration and fusion model for cross-domain mapping.To harmonize the foreground-background discordance, a remote sensing ship image harmonization algorithm was developed.These techniques generated a rich dataset, improving fine-grained
, the remote sensing ship target fine-grained recognition data augmentation model presented in this paper consists of three main modules: the Simulated Image Generating (SIG) module, the Foreground Feature Translation Aligning (FFTA) module, and the Background Feature Translation Aligning (BFTA) module.Both the FFTA and BFTA modules are based on the Local-Aware Progressive Image Conversion Network, LA-CycleGAN.

Figure 2 .
Figure 2. Framework of Data Augmentation Model.
, the remote sensing ship target fine-grained recognition data augmentation model presented in this paper consists of three main modules: the Simulated Image Generating (SIG) module, the Foreground Feature Translation Aligning (FFTA) module, and the Background Feature Translation Aligning (BFTA) module.Both the FFTA and BFTA modules are based on the Local-Aware Progressive Image Conversion Network, LA-CycleGAN.

Figure 2 .
Figure 2. Framework of Data Augmentation Model.Figure 2. Framework of Data Augmentation Model.

Figure 2 .
Figure 2. Framework of Data Augmentation Model.Figure 2. Framework of Data Augmentation Model.

Figure 3 .
Figure 3. Overall Framework of Remote Sensing Ship Image Harmonization Algorithm.

Figure 3 .
Figure 3. Overall Framework of Remote Sensing Ship Image Harmonization Algorithm.

Figure 4 .
Figure 4. Results of Ablation Studies Across Different Modules.

Figure 4 .
Figure 4. Results of Ablation Studies Across Different Modules.

14 Figure 5 .
Figure 5. Line Graph of Classification Accuracy for ResNet and ResNext After Supplementation with Synthetic Images at Various Ratios.

Figure 5 .
Figure 5. Line Graph of Classification Accuracy for ResNet and ResNext After Supplementation with Synthetic Images at Various Ratios.

14 Figure 6 .
Figure 6.Visualization of Comparative Experiments for Image Conversion Methods.

Figure 6 .
Figure 6.Visualization of Comparative Experiments for Image Conversion Methods.

Table 2 .
Ablation Experiments for Fine-Grained Ship Classification Across Different Modules.

Table 3 .
Accuracy of Multiclass Classification Algorithms on Mixed Datasets.

Table 4 .
Classification Accuracy for 12 Types of Ships in EfficientNet-v2.