Adversarial Multiscale Feature Learning Framework for Overlapping Chromosome Segmentation

Chromosome karyotype analysis is of great clinical importance in the diagnosis and treatment of diseases. Since manual analysis is highly time and effort consuming, computer-assisted automatic chromosome karyotype analysis based on images is routinely used to improve the efficiency and accuracy of the analysis. However, the strip-shaped chromosomes easily overlap each other when imaged, significantly affecting the accuracy of the subsequent analysis and hindering the development of chromosome analysis instruments. In this paper, we present an adversarial, multiscale feature learning framework to improve the accuracy and adaptability of overlapping chromosome segmentation. We first adopt the nested U-shaped network with dense skip connections as the generator to explore the optimal representation of the chromosome images by exploiting multiscale features. Then we use the conditional generative adversarial network (cGAN) to generate images similar to the original ones; the training stability of the network is enhanced by applying the least-square GAN objective. Finally, we replace the common cross-entropy loss with the advanced Lovász-Softmax loss to improve the model’s optimization and accelerate the model’s convergence. Comparing with the established algorithms, the performance of our framework is proven superior by using public datasets in eight evaluation criteria, showing its great potential in overlapping chromosome segmentation.


Introduction
Human chromosome karyotype analysis is of great diagnostic and prognostic value in diseases. It is usually performed in clinical diagnosis, cancer cytogenetics, and the detection of genetic abnormalities such as Edwards syndrome and Down syndrome [1,2]. The morphology of chromosomes, such as extra or missing chromosomes, or the structural defects of specific chromosomes can be directly linked to corresponding diseases; hence, chromosome karyotype analysis based on images plays a critical role in routine disease diagnosis and treatment [3]. Figure 1 illustrates the process of chromosome karyotype analysis, which consists of two main steps: segmentation and classification. The performance of the segmentation can directly influence the accuracy of the classification afterward. Although single chromosome classification could achieve high accuracy [4]. However, as shown in the red circles in Figure 1a,b, for the overlapping chromosome segmentation is even more challenging due to the ambiguity in the overlapping regions, which can greatly influence [4]However, as shown in the red circles in Figure 1a,b, for the overlapping chromoso segmentation is even more challenging due to the ambiguity in the overlapping regi which can greatly influence the accuracy of the chromosome karyotype analysis. He in this study we mainly focus on pushing the limit removing the restrictions of over ping chromosome segmentation and designing a highly efficient and accurate over ping-chromosome segmentation method to enhance the overall performance of the ch mosome karyotype analysis. Since manual segmentation is both time and effort consuming and the accur highly depends on the experience level of the analyst, over the past few decades, m algorithms have been proposed to automatically segment the chromosome from the ages on computers [5]. Based on their principles, these segmentation methods can roughly classified into two categories: heuristic methods and learning-based meth Heuristic methods utilize manually tagged features, such as contour, pixels, and geom ric features, to perform segmentation [6][7][8]. Specifically, Ritter et al. utilized shape anal and classification for chromosome segmentation and adopted global context and var analysis methods to solve complex and ambiguous cases [9]. However, it consists of phases and is somewhat cumbersome. Madian et al. used the contour analysis met and constructed reasonable hypotheses for segmentation and separation [10]. Saiyod e proposed an edge detection method that consisted of the flood fill, erosion, and ca methods [11]; however, it only solved the touching chromosomes and not the overlapp chromosomes. Some researchers usually use thresholding strategies for chromosome mentation [12][13][14], adopting a local adaptive histogram equalization technique to ob the appropriate threshold, to further enhance chromosome segments by reducing chances of pixel misclassification. However, these methods are susceptible to noise, c trast, and poor resolution of the image. Gawande et al. applied a fuzzy C-means cluste algorithm and watershed algorithm for chromosome segmentation, but it also did no fectively separate overlapping chromosomes [15]. Sharma et al. adopted a combinatio crowdsourcing for segmentation, but it required considerable effort and time to sepa chromosomes manually [16]. Lin et al. proposed a geometric feature to separate chro somes [17]; however, it could not automate chromosome segmentation well due to irregular shape of chromosomes. These methods can achieve impressive segmentation sults when manual features are properly tagged; however, they are very sensitive to shape and overlapping regions of the chromosomes. Moreover, since they do not cons the untagged features, the performance and applicability of these methods are limi Since manual segmentation is both time and effort consuming and the accuracy highly depends on the experience level of the analyst, over the past few decades, many algorithms have been proposed to automatically segment the chromosome from the images on computers [5]. Based on their principles, these segmentation methods can be roughly classified into two categories: heuristic methods and learning-based methods. Heuristic methods utilize manually tagged features, such as contour, pixels, and geometric features, to perform segmentation [6][7][8]. Specifically, Ritter et al. utilized shape analysis and classification for chromosome segmentation and adopted global context and variant analysis methods to solve complex and ambiguous cases [9]. However, it consists of two phases and is somewhat cumbersome. Madian et al. used the contour analysis method and constructed reasonable hypotheses for segmentation and separation [10]. Saiyod et al. proposed an edge detection method that consisted of the flood fill, erosion, and canny methods [11]; however, it only solved the touching chromosomes and not the overlapping chromosomes. Some researchers usually use thresholding strategies for chromosome segmentation [12][13][14], adopting a local adaptive histogram equalization technique to obtain the appropriate threshold, to further enhance chromosome segments by reducing the chances of pixel misclassification. However, these methods are susceptible to noise, contrast, and poor resolution of the image. Gawande et al. applied a fuzzy C-means clustering algorithm and watershed algorithm for chromosome segmentation, but it also did not effectively separate overlapping chromosomes [15]. Sharma et al. adopted a combination of crowdsourcing for segmentation, but it required considerable effort and time to separate chromosomes manually [16]. Lin et al. proposed a geometric feature to separate chromosomes [17]; however, it could not automate chromosome segmentation well due to the irregular shape of chromosomes. These methods can achieve impressive segmentation results when manual features are properly tagged; however, they are very sensitive to the shape and overlapping regions of the chromosomes. Moreover, since they do not consider the untagged features, the performance and applicability of these methods are limited, and it is difficult to implement them on large datasets. Meanwhile, Learning-based methods usually applied machine learning techniques to mine potential information from the images to perform chromosome segmentation and medical image analysis [18,19]. Some representative examples include Pardo et al., who applied the fully convolutional network (FCN) method for karyotype analysis [20,21]. However, it does not contain overlapping chromosomes, therefore it was difficult to meet the clinical practice. Other researchers use a U-shaped Network (UNet) for overlapping chromosome segmentation [22,23]. Chen et al. proposed the shape learning method to segment both non-overlapped and overlapped regions [24]. Altinsoy et al. proposed a raw G-band chromosome image segmentation method using convolution network [25], but it did not work for overlapping chromosomes. These methods can independently conduct chromosome segmentation when being trained. However, limited by the architecture of the network, current learning-based methods only utilize several layers' features, and they do not take advantage of multiscale features to adapt different chromosome scales. Hence, they do not perform well in overlapping chromosome segmentation. Recently, Chen et al. proposed a multiscale adversarial network [26] for fine-grained image categorization and achieved good classification performance. This provides a new way of thinking about our work.
In this paper, considering the various scales and overlapping regions of chromosome images, we demonstrate an adversarial multiscale feature learning (AMFL) framework that employs a nested U-shaped convolutional neural network (NestedUNet) [27], conditional generative adversarial network (cGAN) [28], and Lovász-Softmax [29] for overlapping chromosome segmentation. Specifically, NestedUNet consists of UNets [30] of varying depths and owns dense skip connectivity, making it capable of synthesizing multiscale feature maps for segmentation. Hence, our AMFL framework utilizes NestedUNet to explore the optimal representation of chromosome images by exploiting multiscale features and fused features. Moreover, we consider the overlapping chromosome image segmentation as an images-to-image task, in which the source overlapping chromosome images are translated to a confidence map to indicate the category information in the source images; we, therefore, use cGAN to push the output distributions close to the ground truth for its success in computer vision tasks, such as image deblurring [31] and image fusion [32]. Finally, to optimize the performance of the discriminatively trained overlapping chromosome segmentation, we apply Lovász-Softmax, which is based on the convex Lovász extension of the submodular loss, as the segmentation loss to achieve superior chromosome segmentation performance and higher index scores compared to the traditional cross-entropy (CE) loss. Additionally, we utilize the least-square GAN objective [33] to replace the original GAN loss in the overlapping chromosome segmentation task to stabilize the training and avoid model collapse. To verify the feasibility of our method, we carry out extensive experiments to compare the performance of our AMFL framework with others. Results show the superiority of our AMFL framework in this work in terms of visual perception analysis and quantitative score comparison. The major contributions of this paper are summarized as follows: 1. We present an adversarial multiscale feature learning framework to improve the accuracy and adaptability of overlapping chromosome segmentation. 2. We use the conditional generative adversarial network to penalize the difference between the generated decision map and the source image, pushing the generator to produce a higher-confidence decision map for the segmentation task. 3. Instead of using the single-scale features to represent the chromosome images, we carefully design a nested U-shaped network with dense skip connections as the generator to capture multiscale features to explore the better representation of the chromosome images. 4. We replace the common cross-entropy loss with the advanced Lovász-Softmax loss to improve the model's optimization and accelerate the model's convergence. 5. We carry out extensive experiments and analyze different objective functions that provided baselines for chromosome segmentation. Results show the superiority of the present AMFL method and the loss function adopted in this work in terms of visual perception analysis and quantitative score comparison.

Network Architecture
In this paper, we consider the overlapping chromosome image segmentation as an image-to-image task, in which the source overlapping chromosome images are translated to a confidence map to indicate the category information in the source images. Compared with the existing methods, we bring the adversarial learning to penalize difference between the generated decision map and source image, pushing the generator to produce a higherconfidence decision map for segmentation task. Moreover, the generator in our framework is deeper than that of CNN-based methods, possessing higher representational capacity. Specifically, as depicted in Figure 2, similar to the original GAN [28], our AMFL framework consists of two modules: a generator and a discriminator. The generator is responsible for exploiting multiscale features for segmentation by producing "fake" chromosome images. While the discriminator serves to distinguish the "fake" images from the "real" ones by adversarial learning. Once the discriminator is "fooled" by the generator, the network is ready to segment chromosomes with high accuracy. Results show the superiority of the present AMFL method and the loss function adopted in this work in terms of visual perception analysis and quantitative score comparison.

Network Architecture
In this paper, we consider the overlapping chromosome image segmentation as an image-to-image task, in which the source overlapping chromosome images are translated to a confidence map to indicate the category information in the source images. Compared with the existing methods, we bring the adversarial learning to penalize difference between the generated decision map and source image, pushing the generator to produce a higher-confidence decision map for segmentation task. Moreover, the generator in our framework is deeper than that of CNN-based methods, possessing higher representational capacity. Specifically, as depicted in Figure 2, similar to the original GAN [28], our AMFL framework consists of two modules: a generator and a discriminator. The generator is responsible for exploiting multiscale features for segmentation by producing "fake" chromosome images. While the discriminator serves to distinguish the "fake" images from the "real" ones by adversarial learning. Once the discriminator is "fooled" by the generator, the network is ready to segment chromosomes with high accuracy. (1) Generator: As shown in Figure 2a, we adopted the advanced NestedUNet as the generator G, which consists of an encoder and a decoder. It takes a source chromosome  (1) Generator: As shown in Figure 2a, we adopted the advanced NestedUNet as the generator G, which consists of an encoder and a decoder. It takes a source chromosome image as the input and outputs a multiclass one-hot map. Specifically, each node in the graph represents a nested convolution block; the downward arrows, upward arrows, and dotted arrows indicate 2 × 2 max-pooling, 2 × 2 up-sampling, and skip connections, respectively. The skip connections merge the encoding and decoding features in the channel dimension by tensor concatenation, enabling dense feature propagation. To better comprehend the network connectivity, we denoted as x i,j the output of the node X i,j . It can be formulated in Equation (1): where function N (·) denotes a nested convolution block operation; D(·) and U (·) denote a down-sampling layer and an up-sampling layer, respectively; and [·] denotes the concatenation layer. Intuitionally, we can see that the nodes at the level of j = 0 receive only one input from the previous layer of the encoder, whereas the nodes at the level of j > 1 receive the up-sampled output of j + 1 nodes from the lower skip connection and all the outputs of the previous j nodes in the same skip connection. Therefore, a dense skip connection was constructed and multiscale features are integrated to provide better feature representation for the segmentation of overlapping chromosome regions with different scales. Meanwhile, in order to better describe network parameters, the number of filters was defined as: f = {64, 128, 256, 512, 1024}, and the number of input channels, middle channels, and output channels of a nested convolution block were defined as follows: In Equation (2), I ij , M ij , O ij are the input channels, middle channels, and output channels of the ijth node, respectively. Note that the middle channels are the output channel of the first convolution layer and also the input channel of the second convolution layer in the nested convolution block. To describe the network structure in detail, we denote the convolution layer, batch norm layer, and rectified linear unit [34] as Conv, BN, and ReLu, respectively. The nested Conv block is two Conv-BN-ReLu layers with a filter size of 3 × 3, padding of 1, and stride of 1, which aims to keep the size of the feature map consistent after each convolution operation. The last layer of convolution kernel size with 1 × 1, and the feature map of the last nodeis mapped into a confidence map using the Softmax operation for producing a one-hot map.
(2) Discriminator: As shown in Figure 2b, inspired by the PatchGAN in [28], we used a simplified fully convolutional neural network [21,32] as the discriminator D, which is able to push the output's distribution closer to the ground truth, making the generator produce high-confidence segmentation maps. This discriminator tries to distinguish whether each K × K patch in an image is real or fake, and then averages all responses convolutionally across the image to provide the ultimate output of D. Specifically, it consists of five convolution layers with a filter size of 4 × 4 kernel and {64, 128, 256, 512, 1024} channels. The first four convolution layers with padding of 2 and stride of 2, and the last two layers with padding of 2 and stride of 1. Each convolution layer is followed by a Leaky-ReLu parameterized by 0.2 except the last layer. Then a sigmoid function follow with the last layer and can produce a binary output for discriminating "real" or "fake" images. Finally, it is worth noting that the input of the discriminator is multichannel images created by concatenating the source images and the segmented images (generator produced) in the Entropy 2022, 24, 522 6 of 18 channel dimension, aiming to provide prior information for better-discriminating features. The generator and discriminator were alternately trained using the objective function represented as follows.

Objective Function
(1) Lovász-Softmax: It can optimize the Jaccard index in the continuous optimization framework [29]. Specifically, this method can substantially improve the accuracies of semantic segmentation by optimizing the correct loss during training. Therefore, we chose Lovász-Softmax as the loss of the generator, which can be simplistically defined in Equation (3): where m(c) is a vector of pixel errors for class c ∈ C aiming to construct the loss surrogate to ∆J c , it is defined by: In Equation (4), y is ground truth, and f i (c) is the predicted scores of the model that is mapped to probabilities through a Softmax unit. ∆J c is the set function encoding a submodular Jaccard loss for class c, indicating a set of mispredictions. Specially, ∆ is the surrogate for the minimization of ∆ with first-order continuous optimization, and the elementary operations involved in the calculation of ∆ (sort) are differentiable.
(2) GAN loss: First and foremost, we needed to choose an appropriate loss function for training our AMFL framework. It is well known that the regular GAN [35] loss is always difficult to converge and can suffer from model collapse. We, therefore, adopted the least-squares generative adversarial network (LSGAN) [33] as the loss function in our work, which is more stable and can achieve better segmentation results by previous experimental experience [29]. It is defined by Equation (5): The adversarial learning process is also optimized through the LSGAN, which is formulated in Equation (6): Furthermore, in order to make the segmentation map as close as possible to the ground truth, we adopted Lovász-Softmax loss for supervised segmentation. The objective function for AMFL, therefore, can be defined by Equation (7): where λ controls the relative importance of the two objective functions. Empirically, we set λ to 10 in our work.

Evaluation Metrics
To quantitatively evaluate the performance of our method, we selected eight evaluation metrics including the pixels accuracy (Acc) [36], dice coefficient (Dice) [37], intersection over union (IoU) [36], precision, recall, false-negative rate (FNR), false-positive rate (FPR) [36], and Hausdorff distance (Hausdorff) [36], which are briefly introduced below. For conve-nience, we used O to denote the output segmentation image and G to indicate the ground truth. Moreover, the index ranges over the interval [0, 1] except Hausdorff.
(1) Acc: It indicates the pixel accuracy of the predicted results in the segmentation. In other words, it represents the proportion of pixels in an image that is correctly predicted. The Acc is calculated using Equation (8): Here, P ij means the numbers of pixels that are classified as pixel j but actually belongs to pixel i, and c is the categories.
(2) Dice: This metric represents the similarity of the predicted image O to the ground truth G. The Dice is calculated using Equation (9): where |G| and |O| represent the numbers of elements in the arrays.
(3) IoU: This metric represents the intersection area between the predicted image O and the ground truth G, the IoU can be calculated using Equation (10): (4) Precision: It indicates how reliable the prediction is. This metric can be calculated using Equation (11): where TP c represents the true positives, which means the pixels correctly predicted to belong to class c, while FP c represents the false positives, indicating the pixels that are predicted as class c but do not actually belong to class c. (5) Recall: It indicates how sensitive the prediction is. Therefore, it is also called sensitivity, which can be calculated using Equation (12): where the FN c represents the false negatives, meaning the pixels that are predicted as not class c but actually belong to class c. (6) FNR: It is also called the under-segmentation rate, which measures the proportion of the positive classes that are predicted to be negative. It is defined as in Equation (13): (7) FPR: It is also called the over-segmentation rate, measuring the proportion of the negative classes that are predicted to be positive. This metric is calculated using Equation (14): where TN c represent the true negatives, which mean the pixels that are correctly predicted not to belong to class c.  (8) Hausdorff: It represents the shape similarity between the predicted images O and the ground truths G. It is calculated using Equation (15): where d(·) represents the Euclidean distance between the pixel points x and y. The smaller the Hausdorff distance is, the greater the similarity between the predicted segmentation maps and the ground truth is. Note that, for each metric, a higher value indicates better performance, except for FNR, Hausdorff, and FPR, where a lower score gives a better segmentation result

Baselines and Implementation
We validate the effectiveness of our method by comparing it with 10 recent state-ofthe-art algorithms, including efficient neural network (ENet) [38], bilateral segmentation network version 1 (BiSeNetV1) [39], BiSeNetV2 [40], DeepLabV3+ [41], faster fully convolutional network (FastFCN) [42], U-shaped network [30] (UNet), recurrent residual UNet (R2UNet) [43], attention UNet (AttUNet) [44], recurrent residual attention UNet (R2AttUNet) [45], and nested UNet (NestedUNet) [27]. Among them, ENet, BiSeNetV1, and BiSeNetV2 are small-scale models that usually have smaller network scales and higher inference speed. While the others are large-scale models that usually have more complex network structures and can learn more potential semantic features. The above methods were used as baselines to evaluate the performance of our method comprehensively. Furthermore, in order to verify our selection of loss function, four commonly used loss functions were tested. Since all the above methods have not been used for the chromosome segmentation task, we instead implemented all the methods ourselves with the same hyperparameters to have a fair comparison.

Selection of the Objective Function
In order to show the superiority of using Lovász-Softmax as the loss for overlapping chromosome segmentation, as shown in Figure 3, we drew average metric graphs for prevalent losses on all testing sets with all the methods. Intuitively, it can be seen that our method with Lovász-Softmax outperformed all baseline methods in all the metrics. Moreover, it is clear that all baseline methods with Lovász-Softmax also show a leading scoring trend against other losses, indicating that Lovász-Softmax is effective and optimal for overlapping chromosome segmentation feature extraction.   Table 1, we evaluate the performance of our method with different gener works in terms of all the quantitative indicators. Obviously, we can clearly see framework with NestedUNet as the generator network is better than other configu indicating that the modeling combined with multiscale features is effective.

Selection of the Generator
In Table 1, we evaluate the performance of our method with different generator networks in terms of all the quantitative indicators. Obviously, we can clearly see that the framework with NestedUNet as the generator network is better than other configurations, indicating that the modeling combined with multiscale features is effective.

Preliminary Preparation 2.6.1. Data Preparation and Preprocessing
Due to the difficulty in obtaining clinical data, we use Pommier's overlapping chromosome datasets [46,47] to demonstrate the effectiveness of the present method. The dataset contains 13,434 overlapping chromosomes with a resolution of 94 × 93. For each image, there is a corresponding ground truth, in which each pixel represents an object class. In the segmentation map, class labels of 0, 1, 2, and 3 are denoted as the background (shown as black), non-overlapping regions of the first chromosome (shown as red), non-overlapping regions of the second chromosome (shown as green), and overlapping regions of chromosome (shown as blue), respectively. To match the images with our network, we padded the images to 128 × 128. The padding value of input images and ground truths was set as 255 and 0, respectively, to be consistent with the background of the original images. We divided the datasets into two subsets: 80% for training (a total number of 10,747 images) and the remaining 20% for testing (a total of 2686 images). In the training set, the number of pixels for the four classes were 167,373,977, 284,038, 5,138,621, and 1,282,212, respectively. In the testing set, the number of pixels for the four classes were 41,825,569, 575,220, 1,286,495, and 320,140, respectively. Therefore, we can calculate that the proportions of pixels for the four classes were 24.99%, 23.16%, 25.04%, and 24.97%, respectively. It shows that when we select 20% of images as testing set, there is no guarantee that every pixel class will be evenly divided. Moreover, the uneven distribution of pixel categories will affect the evaluation of classification results. Specifically, since some of the images in the testing set lack overlapping domains, this means that class 3 is missing. As a result, there is a clear bias in the actual final classification result. Therefore, in order to solve this problem and make our results reliable, we only kept pairs with the ground-truth containing overlapping domains for testing sets (a total of 2432 images).

Implementation
In the training stage, all the training sets were shuffled, and all input images were normalized to the range of 0-1, and the batch size was set to 64. We optimized the generator and the discriminator alternately, both applying the Adam solver with a fixed learning rate of 0.0002 and momentum parameters of β1 = 0.5 and β2 = 0.999. Then, we set the random seed to 123. We trained our framework from scratch with the training sets to produce the "optimized" model. The training was stopped when training losses did not decrease for 200 consecutive epochs. We saved the generator model weights when the training Dice scores were at their highest. For the inference stage, we used the well-trained framework to segment the images. All the experiments were conducted in Pytorch [48] under an Ubuntu OS cloud server with an Intel Xeon(R) CPU E5-2680 v4 @2.40 GHz, 40 GB of RAM, and an NVIDIA Tesla P40 GPU with 24 GB of memory. Figure 4 exhibits some examples of the segmentation results of our method, from which we can see that our method achieved an excellent visual perception result.

Performance
Moreover, we can also see that the various scales of chromosome individuals and overlapping regions were correctly segmented in all the images, indicating that our method performed well in the multiscale segmentation task. In order to further highlight the superior performance of our algorithm, we show confusion matrices of average accuracy scores on all the testing sets in Figure 5. We can see that our method showed better results than other state-of-the-art methods. Through careful comparison, these quantitative results proved consistent with the quantitative results in Section 2.5, demonstrating the significant superiority of our method, not only for visual perception but also for quantitative analysis.  Moreover, we can also see that the various scales of chromosome individuals and overlapping regions were correctly segmented in all the images, indicating that our method performed well in the multiscale segmentation task. In order to further highlight the superior performance of our algorithm, we show confusion matrices of average accuracy scores on all the testing sets in Figure 5. We can see that our method showed better results than other state-of-the-art methods. Through careful comparison, these quantitative results proved consistent with the quantitative results in Section 2.5, demonstrating the significant superiority of our method, not only for visual perception but also for quantitative analysis.

Visual Evaluation
In this section, we visually compare the performance of our method with baseline

Visual Evaluation
In this section, we visually compare the performance of our method with baseline methods. Figure 6 exhibits the results including difference images using pseudo-color map. Here, the difference images are generated through logical multiplication of the inversed ground truth and corresponding predicted result. Figure 6a-j show the results acquired using baseline methods with CE loss, while Figure 6k-o were acquired using the presented method with various loss functions. We can see from Figure 6 that our method with Lovász-Softmax or weight-dice loss achieved excellent segmentation results, while the performance of other methods was obviously poor, meaning that these methods do not learn effective features for the overlapping chromosome segmentation. NestedUNet performed the segmentation better than other large-scale models, indicating that multiscale feature learning is helpful for overlapping chromosome segmentation. Furthermore, we can see that the difference images acquired with our method were obviously cleaner than those acquired with other methods, indicating that the cGAN applied in our methods is effective to distinguish the segmented images and ground truths so as to better learn the features of the chromosomes. Additionally, it is clear that our method with Lovász-Softmax loss segmented the images more accurately, where almost every chromosomal region was correctly segmented, compared with other methods. This indicates that the Lovász-Softmax loss helped improve the discrimination ability of our method. In a word, the images shown in Figure 6 visually show that our AMFL framework can segment overlapping chromosomes with better performance than the baseline methods.

Quantitative Evaluation
In this part, we quantitatively compare the performance of our method with others and show the results in Table 2. Here, we use the common CE loss for baseline methods.

Quantitative Evaluation
In this part, we quantitatively compare the performance of our method with others and show the results in Table 2. Here, we use the common CE loss for baseline methods. We can see that our AMFL achieved the best performance in all the metrics. This indicates that using cGAN to discriminate features can push the output distribution closer to ground truth so that our method outperforms others in overlapping chromosome segmentation tasks. It is also clear that the small-scale models presented almost the worst scoring in terms of Dice, IoU, and Hausdorff, while large-scale models reached better scores, suggesting that overlapping chromosome segmentation requires a more complex network structure. Again, NestedUNet achieved the top performance compared to other methods, quantitatively verifying the importance of multiscale feature analysis. It is worth emphasizing that our method had a lower Hausdorff distance score, indicating that it retained the shape and structure of the chromosome in the output images. The quantitative results, which are consistent with what we can see from Figure 6, prove the effectiveness of our method in overlapping chromosome segmentation. Moreover, in order to further highlight the superior performances of our present method. In Table 3, we show the average IoU scores of each class and the accuracy for all the testing sets, compared against methods specifically designed for overlapping chromosome segmentation. We can see that our method has a significant advantage over these two methods in terms of the two quantitative metrics score. Especially for classes 1 and 2, IOU scores improved by 8.89% and 4.53%, respectively, over Hu et al.'s method and the IOU scores of all classes were better than those for CE-Net, which again proves the superiority of the AFML framework for overlapping chromosome segmentation. However, due to the imbalance of the categories in the training sets with a lack of overlapping areas, resulting in a slightly lower score of average IoU score for class 3 than Hu et al.'s method, but it was more in accordance with the diversity of clinical data. Since it is impossible that all chromosome images will have overlapping in the clinic, it is also very important to correctly separate nonoverlapping chromosomes. Note that the index scores for all existing methods are drawn from the references, "-" indicates that it is not described in the paper. Class 1 and 2, are denoted as the two non-overlapping regions of the chromosomes and class 3 as the overlapping regions of chromosome. The best two results are highlighted in red and green, respectively. TTA indicates test time augmentation.

Computational Efficiency
To evaluate the computation efficiency, we present the total number of model parameters and the average running time of CPU and GPU when using different methods on all the testing sets in Table 4. The methods with small-scale networks consumed the least resources, obtaining the advancement of rate by sacrificing the accuracy. Among the methods with large-scale networks, our method took about 27 ms to segment an image on GPU, which ranked second only behind UNet; nevertheless, its model parameters were also the second smallest. Our method spent 568 ms on CPU, ranking fifth above R2UNet and R2AttUNet. The results show that, in addition to the outstanding segmentation performance, our method also performed well in computational efficiency, suggesting its great potential in real applications.

Ablation Study
In order to analyze the role of different parts of the proposed framework, we present the average quantitative results of the proposed method by using and without using GAN with different objective functions in Table 5. Obviously, the superiority of our method lies in the following: firstly, for the single model without using GAN, NestedUNet with Lovász-Softmax achieved an improved performance compared with other losses on all the testing sets. Secondly, the proposed AMFL adopted the GAN mechanism to discriminate features, resulting in a better scoring performance than the individual NestedUNet model without GAN. Thirdly, our method with Lovász-Softmax loss achieved the best performance in most of the metrics, demonstrating the effectiveness of using Lovász-Softmax to improve the discrimination ability. Fourthly, the framework with NestedUNet as the generator performed better than other configurations, indicating that modeling with multiscale features is effective for overlapping chromosome segmentation. Lastly, whether using or without using a GAN, the model with a NestedUNet as the backbone network performed better than other competitors. Moreover, we also split the dataset into training, validation, and testing sets, and we achieved results that were almost in close agreement with those

Conclusions
In this paper, we propose and demonstrate the AMFL framework for overlapping chromosome segmentation. In the network, instead of using single-scale features to represent chromosome images, we carefully designed a nested U-shaped network with dense skip connections as the generator to capture multiscale features to explore a better representation of the chromosome images. Then, we utilized cGAN to provide prior information for better discriminating features and producing highly accurate chromosome segmentation images. In addition, we replaced the common cross-entropy loss with the advanced Lovász-Softmax loss to improve the model's optimization and accelerate the model's convergence. In addition, we utilized the least-square GAN objective to replace the original GAN loss to stabilize the training and avoid model collapse. As for the objective function, we chose Lovász-Softmax after experimentally comparing it with others on their performance in overlapping chromosome segmentation. At last, to show the superiority of our AFML, we compared it with 10 state-of-the-art semantic segmentation methods. The results show that our AFML performed better in both visual perception and eight quantitative metrics.
Currently, our AFML performs well in public overlapping chromosome datasets. However, due to the difficulty in obtaining clinical data, our well-trained model may fail to meet situations where the images exist with severe morphological inconsistency in the clinical overlapping chromosome. In future research, we intend to collect annotated clinical data and design a generalized, fully automatic system for the segmentation, classification, and karyotype analysis of chromosomes.  Data Availability Statement: The datasets generated and/or analyzed during the current study are available at https://www.kaggle.com/jeanpat/overlapping-chromosomes (accessed on 29 March 2022).

Conflicts of Interest:
The authors declare that they have no conflict of interest.