HUnet++: An Efficient Method for Vein Mask Extraction Based on Hierarchical Feature Fusion
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsProvide references in paragraph 1 and 2 of introduction.
Paragraph 3 of introduction, line 33 as per manuscript, reframe the sentence
Figure 1, provide reference from where the figure is taken or mention in the text if it is simulated/generated by your replication
Line 47-51, how are authors claiming that the result demonstrate dependence on image quality, it needs to be explained whether different types of images were taken as it is not self explanatory since figure shows various methods, rather than various quality of images. If there is some literature supporting this claim, then that should be added with proper citation
Line 100, the link needs to be provided in references or additional information rather than in main text
Author Response
Comments 1 : [Provide references in paragraph 1 and 2 of introduction.]
Response 1 : [Thank you for pointing this out. We have made the modifications in my paper as per your advice. We have added the references in the first and second paragraphs of the introduction to support the claims and provide a more robust foundation for the discussion.]
Comments 2 : [Paragraph 3 of introduction, line 33 as per manuscript, reframe the sentence.]
Response 2 : [Thank you for pointing this out. We have revised the sentence in paragraph 3, line 33, of the introduction as per your recommendation. The sentence has now been rephrased to improve clarity and flow.]
Comments 3 : [Figure 1, provide reference from where the figure is taken or mention in the text if it is simulated/generated by your replication]
Response 3 : [Thank you for pointing this out. Figure 1 presents the feature extraction results, which we generated using MATLAB by implementing the algorithms discussed in the paper.]
Comments 4 : [Line 47-51, how are authors claiming that the result demonstrate dependence on image quality, it needs to be explained whether different types of images were taken as it is not self explanatory since figure shows various methods, rather than various quality of images. If there is some literature supporting this claim, then that should be added with proper citation]
Response 4 : [Thank you for pointing this out.
We appreciate the reviewer’s comment. The images used in our analysis come from two publicly available datasets with images of different resolutions: 320x240, 300x100, and 640x480. These variations represent different levels of image quality, with lower-resolution images typically suffering from increased noise and less distinct vein patterns. This degradation in image quality makes it challenging for traditional vein extraction methods to distinguish vein structures from noise, as demonstrated in the second row of Figure 1, where the extracted vein patterns become less correlated with the original vein structures as the image resolution decreases.
Our claim that vein extraction methods are dependent on image quality is supported by existing literature that discusses the impact of image contrast and intensity on biometric recognition performance. We added these references in the revised manuscript to substantiate our claim further.
Thank you for the insightful suggestion, which we believe will help clarify our findings.]
Comments 5 : [Line 100, the link needs to be provided in references or additional information rather than in main text]
Response 5 : [Thank you for pointing this out. We have made the required modification in the paper and moved the relevant link from the main text to the references or additional information section as per your request.]
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper presents HUnet++, a novel adaptation of the UNet++ architecture optimized for vein mask extraction. The model is designed to improve prediction speed while maintaining accuracy, leveraging a hierarchical feature fusion approach. The authors validate the model through comparisons with state-of-the-art methods and structural reparameterization, highlighting its efficiency in vein mask extraction tasks. The paper provides a valuable contribution to the field of vein recognition and is well-structured. Addressing the highlighted weaknesses will be a strong addition to the literature on efficient deep-learning models for biometric recognition.
1. While relevant, the datasets used for validation primarily focus on vein images. Broader testing on other biometric modalities (e.g., facial recognition or retinal scans) could enhance generalizability claims.
2. The paper could benefit from a deeper theoretical analysis of how hierarchical feature fusion improves accuracy and speed compared to traditional feature fusion techniques.
3. Although structural reparameterization is a key aspect of the proposed model, its effect on various components, such as the number of hidden layers or feature dimensions, has been limitedly explored.
4. While feature extraction diagrams and result comparisons are provided, additional visualizations (e.g., attention maps) could help elucidate why HUnet++ performs better on specific datasets.
5. Some implementation details, such as the preprocessing pipeline and hyperparameter settings for other baseline models, are underexplored.
6. Add more relevant references.
Author Response
Comments 1 :[While relevant, the datasets used for validation primarily focus on vein images. Broader testing on other biometric modalities (e.g., facial recognition or retinal scans) could enhance generalizability claims.]
Response 1 :[Thank you for your valuable suggestion. We agree with your point that broader testing on other biometric modalities, such as facial recognition or retinal scans, would enhance the generalizability of the model. However, the primary focus of our current research is to address the challenge of finger vein mask extraction, which is why experiments involving other datasets were not included in this paper. We plan to conduct these additional experiments in future work to further assess the model’s generalizability.]
Comments 2 :[The paper could benefit from a deeper theoretical analysis of how hierarchical feature fusion improves accuracy and speed compared to traditional feature fusion techniques.]
Response 2 :[Thank you for your suggestion. The concept of hierarchical feature fusion in our model originates from the U-Net architecture, which is also inherited by the U-Net++ model. Our approach is based on an efficient pruning and compression of the U-Net++ model, where we preserve multi-level features while reducing the fusion module to a single decoder block. Additionally, we apply structural re-parameterization to compress the trained model further, which helps to reduce its size and accelerate inference speed.
To support the advantages of hierarchical feature fusion, we referenced the work by Suri et al. ('U-Net Deep Learning Architecture for Segmentation of Vascular and Non-Vascular Images: A Microscopic Look at U-Net Components Buffered With Pruning'), which demonstrates the utility of multi-level feature fusion in extracting intricate features such as vascular structures. We have elaborated on these points in the paper to provide a deeper theoretical analysis, as per your suggestion.]
Comments 3 :[Although structural reparameterization is a key aspect of the proposed model, its effect on various components, such as the number of hidden layers or feature dimensions, has been limitedly explored.]
Response 3 :[Thank you for your valuable feedback. In our experiments, we explored different configurations for the number of hidden layers, starting from 18 layers. However, when the number of hidden layers was reduced below 32 (even to 31 layers), the performance of the HUnet++ model did not outperform the original model. In fact, as the number of hidden layers decreased, the model's performance degraded. Based on your suggestion, we have added a more detailed explanation of this in the paper.
Additionally, since the effective size of the finger vein region in the images typically falls within the range of 200 to 360 pixels, we chose this range for our experiments to ensure the inclusion of sufficient detail for vein extraction.]
Comments 4 :[While feature extraction diagrams and result comparisons are provided, additional visualizations (e.g., attention maps) could help elucidate why HUnet++ performs better on specific datasets.]
Response 4 :[Thank you for your valuable suggestion. Based on your feedback, we have incorporated additional visualizations, including attention maps, into the paper. These visualizations help to better explain why the HUnet++ model performs better on specific datasets. We believe these additional visualizations will enhance the understanding of the model's performance and have included them in the revised version of the paper.]
Comments 5 :[While feature extraction diagrams and result comparisons are provided, additional visualizations (e.g., attention maps) could help elucidate why HUnet++ performs better on specific datasets.]
Response 5 :[Thank you for your valuable suggestion. Based on your feedback, we have now included the preprocessing pipeline and hyperparameter settings for the baseline models in the paper. These details are crucial for ensuring transparency and reproducibility, and we have added them in the relevant sections to provide a more comprehensive understanding of the experimental setup.]
Comments 6 :[Add more relevant references]
Response 6 :[Thank you for your valuable suggestion. Based on your feedback, We have reviewed the paper and added more relevant references to support the discussion and analysis presented. The newly added references can be found in the updated reference section of the manuscript.]
Reviewer 3 Report
Comments and Suggestions for AuthorsIn biometric application with finger vein recognition, accuracy and efficiency are crucial. This paper proposed a hierarchical feature fusion approach for vein mask extraction that increase performances.
The novelty of this study lies in the consideration of hierarchical aspects in U-Net based models.
The following improvements are suggested.
1. Abstract: state the significance of the proposed study (highest performance values obtained)
2. In the introduction section, briefly describe the limitations and challenges in the latest related studies, and there by direct the path towards the contribution of this study
3. The facts indicated in the introduction, should be supported by references.
4. In the introduction, state the research questions (RQs) addressed by this study. And in the discussion section, justify the achievement of the RQs by the followed methodology and the obtained results.
5. Include a new section 2 for related studies, and move the details stated in the introduction section for the related studies to a new section. Since this paper is going to be published in the year 2024, it would be better to consider DL based studies in recent years with the latest technologies. Discuss the techniques used in the related studies together with the limitations in the existing studies and justify the proposed method.
6. In the related study description, Improve the writing comprehension, by maintaining a proper flow of information among the paragraphs.
7. Section 2.2 - Clearly state the used datasets and the number of images for different labels, Better to have a table. Discuss data balance/ imbalance issues.
8. In the methodology, justify the reason for selecting U-Net++ among other U-Net types (attention/ feedback).
9. Have you addressed the associated complexity of the UNet++ architecture? Any solutions to minimize the computational cost and memory usage? Otherwise, it will be an issue when deploying this solution in real-time applications or in resource-constraint devices.
10. How do you address the loss of spatial information due to downsampling operations in your architecture? How do you prove that, this may not have a major impact on the precision of segmentation boundaries.
11. Have you tried any model optimizations.
12. It would be better to include the learning curve graphs, such as training and validation, accuracy and loss curves. With that, discuss data overfitting or underfitting issues.
13. What are the associated model complexities/ model size/ required hardware resources to run the model.
14. In the discussion section, justify the achievements of the said contributions or research questions mentioned in the introduction, referring to the followed methodology and the obtained results.
15. In the discussion, include a comparison table, that distinguishes the features and the results of this study, with the existing latest studies. By that, You can justify the novel contributions.
16. In the discussion, state the research limitations and the future possible extensions.
17. Discuss the possible practical applications, of this model.
Author Response
Comment 1 : [ Abstract: state the significance of the proposed study (highest performance values obtained)]
Response 1 : [Thank you for pointing this out. We agree with this comment and have revised the abstract as per your suggestion to clearly state the significance of the proposed study and highlight the highest performance values obtained in the research. ]
Comment 2 : [ In the introduction section, briefly describe the limitations and challenges in the latest related studies, and there by direct the path towards the contribution of this study]
Response 2 : [Thank you for pointing this out. We agree with this comment and have added a brief description of the limitations and challenges in the latest related studies in the introduction section. We have also clearly directed the path towards the contribution of our study, highlighting how our approach addresses these challenges. ]
Comment 3 : [ The facts indicated in the introduction, should be supported by references.]
Response 3 : [Thank you for pointing this out. We agree with this comment. The missing references have now been added at the appropriate positions in the introduction section to support the facts indicated. ]
Comment 4 : [ In the introduction, state the research questions (RQs) addressed by this study. And in the discussion section, justify the achievement of the RQs by the followed methodology and the obtained results.]
Response 4 : [Thank you for pointing this out. Although our paper already stated the research questions (RQs) addressed by this study and discussed how the methodology and obtained results support these RQs, We had not explicitly mentioned the exact precision of the final model. Instead, We had stated that the model's precision was comparable to the original model, but it was faster. Following your advice, We have now updated the paper to include the specific accuracy values of the final model.]
Comment 5 : [ Include a new section 2 for related studies, and move the details stated in the introduction section for the related studies to a new section. Since this paper is going to be published in the year 2024, it would be better to consider DL based studies in recent years with the latest technologies. Discuss the techniques used in the related studies together with the limitations in the existing studies and justify the proposed method.]
Response 5 : [
Thank you for your suggestion regarding the 'Related Work' section. We fully understand and appreciate your advice. This structure helps present the research background and motivation directly to the readers in the introduction, while providing a clear foundation for the contributions of our study.
To better address your feedback and enhance the paper's clarity and structure, we have restructured the manuscript, moving the recent research progress into a separate 'Related Work' section. This change allows us to discuss the techniques used in related studies, their limitations, and to justify our proposed method more effectively. We believe these adjustments will significantly improve the paper's readability and clearly highlight the current state of research in the field.]
Comment 6 : [ In the related study description, Improve the writing comprehension, by maintaining a proper flow of information among the paragraphs.]
Response 6 : [Thank you for pointing this. In response, I have improved the flow of information between paragraphs in the 'Related Work' section to enhance the overall comprehension and coherence of the writing.]
Comment 7 : [ Section 2.2 - Clearly state the used datasets and the number of images for different labels, Better to have a table. Discuss data balance/ imbalance issues.]
Response 7 : [Thank you for your suggestion. In Section 2.2, we have included a table (Table 1) that clearly states the used datasets and the number of images for each label. Since the primary focus of our study is on extracting finger vein masks (image segmentation), the dataset exhibits an inherent imbalance, with the number of background pixels far exceeding the number of finger vein pixels. To address this, we have also included a discussion on the imbalance between positive and negative samples in the dataset.]
Comment 8 : [In the methodology, justify the reason for selecting U-Net++ among other U-Net types (attention/ feedback)]
Response 8 : [Thank you for pointing this out. We agree with this comment and have made the required changes in the methodology section, where we now justify the reason for selecting U-Net++ among other U-Net variations, such as attention and feedback U-Net types.]
Comment 9 : [ Have you addressed the associated complexity of the UNet++ architecture? Any solutions to minimize the computational cost and memory usage? Otherwise, it will be an issue when deploying this solution in real-time applications or in resource-constraint devices.
]
Response 9 : [Thank you for your question. Indeed, the computational complexity and memory usage of the U-Net++ architecture pose a significant challenge, particularly when deploying the model on resource-constrained devices such as mobile or embedded systems. U-Net++ enhances the model's expressive power through deeper network layers and dense skip connections, which, while improving performance, also increases computational and memory overhead.
To address this, we have implemented pruning (removing five decoder blocks while applying convolutions to the encoder outputs to reduce the number of channels) and structural re-parameterization. These techniques help reduce the model's complexity and parameter count without sacrificing accuracy. Pruning reduces the GPU memory footprint during inference, especially since U-Net++ stores many intermediate feature maps in the decoder, while structural re-parameterization merges convolution and batch normalization into a single convolution operation, thus reducing the overall parameter count. These improvements help decrease training and feature extraction times.
For real-time applications or deployment on resource-limited devices, we plan to use quantization techniques to further optimize the model. By converting the model’s weights and activation functions from floating-point to lower-bit-width data types (such as 8-bit integers), we aim to reduce memory usage and speed up inference. We have discussed this as part of the future work in the discussion section of the paper.]
Comment 10 : [ How do you address the loss of spatial information due to downsampling operations in your architecture? How do you prove that, this may not have a major impact on the precision of segmentation boundaries.]
Response 10 : [Thank you for your question. To address the potential loss of spatial information caused by downsampling operations in our architecture, we leverage skip connections that directly pass high-resolution feature maps from the encoder to the corresponding decoder layers. This allows the decoder to utilize high-resolution features to recover spatial details, effectively compensating for the information loss during downsampling. This mechanism, inherited from the U-Net model, is also present in U-Net++ and is crucial in preserving spatial accuracy.
We have retained this operation in our model, and in the decoder, we fuse feature maps from multiple levels to maintain rich spatial details. To demonstrate the effectiveness of this approach, we compared our model with the original U-Net++ model and other models. The results show that our model achieves nearly identical accuracy in segmentation tasks. Additionally, we included visual results in the paper, showing the final extracted finger vein masks, where the masks generated by our model are almost indistinguishable from those obtained by the original model.]
Comment 11 : [ Have you tried any model optimizations.]
Response 11 : [Thank you for your question. We have explored several optimization approaches for the model. In addition to the discussion on the hidden layers in the paper, we also experimented with changes to the loss function. However, through multiple experiments, we found that using the same loss function as the original UNet++ model yields the best results. When training on manually labeled datasets, we used the same foreground and background weight settings as in the UNet++ model. However, for the traditionally labeled datasets, the weight settings for the foreground and background in the loss function slightly differed from those in the UNet++ model. This is because traditional methods for extracting finger vein features often result in both explicit features (such as geometric and structural patterns of the vein region) and implicit features (i.e., potential image features that do not accurately represent the vein pattern, leading to vein masks that do not reveal the actual location of the embedded veins). As a result, the generated finger vein masks may not have continuous or aggregated vein structures, and may even lack the geometric characteristics of veins. This issue can cause the model to fail to converge. To address this, after failing to improve results by changing the loss function, we reverted to the original loss function of UNet++, but with adjusted weights: we assigned higher weight to the foreground and reduced the weight for the background, so that the model could focus more on segmenting the vein pixels during backpropagation. The specific weight values were not mentioned in the paper, as our experiments revealed that the optimal values vary across the three different traditionally labeled datasets. Therefore, the optimal weights for each dataset need to be determined through rigorous experimentation.]
Comment 12 : [ It would be better to include the learning curve graphs, such as training and validation, accuracy and loss curves. With that, discuss data overfitting or underfitting issues.]
Response 12 : [Thank you for pointing this out. We agree with this comment and have included the learning curve graphs, to provide more insight into the model's performance. Additionally, we have discussed potential data overfitting and underfitting issues. These modifications have been marked in red in the revised manuscript. ]
Comment 13 : [ What are the associated model complexities/ model size/ required hardware resources to run the model.]
Response 13 : [Thank you for pointing this out. We agree with this comment. To train our proposed model on your own data, you only need a GPU with 4GB of VRAM, assuming the image size is kept consistent with our study (240x240 pixels). For inference, the model's maximum GPU memory usage during inference is 86.76 MB. The hardware environment used for our experiments is mentioned in the 'Model Parameter Configuration' section of the paper, which includes the Colab platform with a Tesla K80 GPU, 16GB of RAM, Ubuntu 22.04.2 LTS, and 12.7GB of system RAM. Of course, additional information regarding the hardware resources required for the model's training will be updated in the corresponding section of the paper.]
Comment 14 : [ In the discussion section, justify the achievements of the said contributions or research questions mentioned in the introduction, referring to the followed methodology and the obtained results.]
Response 14 : [Thank you for pointing this out. We agree with this comment and have added the research questions mentioned in the introduction to the discussion section. This revision strengthens the overall structure of the paper and ensures that the methodology and results directly address the research questions.]
Comment 15 : [ In the discussion, include a comparison table, that distinguishes the features and the results of this study, with the existing latest studies. By that, You can justify the novel contributions.]
Response 15 : [Thank you for pointing this out. We agree with this comment and have added a comparison table in the discussion section, where we distinguish the features and results of our study from those of the existing latest studies. This addition helps to clearly highlight the novel contributions of our work.]
Comment 16 : [ In the discussion, state the research limitations and the future possible extensions.]
Response 16 : [Thank you for pointing this out. We agree with this comment and have revised the discussion section to clearly state the research limitations and potential future extensions. These updates help to provide a comprehensive view of the current work and the direction for further improvements. ]
Comment 17 : [ Discuss the possible practical applications, of this model.]
Response 17 : [Thank you for pointing this out. We agree with this comment and have revised the paper to include a discussion on the possible practical applications of our model. This revision helps to illustrate the potential real-world uses of the proposed method and its broader impact in relevant fields. ]
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThis present form can be accepted.