Anonymous Networking Detection in Cryptocurrency Using Network Fingerprinting and Machine Learningâ€
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents a well-motivated and timely contribution to the field of network security and privacy within permissionless cryptocurrency systems.
(1)While the paper differentiates between IP spoofing and profile spoofing, a concise definition in the introduction or a dedicated subsection would help reinforce understanding for readers unfamiliar with the nuance.
(2)While the accuracy and F1-score are informative, presenting ROC-AUC values or precision-recall curves (especially in imbalanced cases) would further support the robustness of the classification results.
(3)A short paragraph on the potential ethical considerations (e.g., deanonymization, surveillance risks) and how such detection tools could be responsibly used in cryptocurrency systems could enrich the paper’s broader impact.
(4)There are some minor typographical and formatting issues (e.g., inconsistent spacing between sections, occasional line breaks in the middle of sentences). A final proofreading pass is recommended.
(5)The paper occasionally repeats points made earlier (e.g., emphasis on spoofing threats). Consider tightening the discussion for better readability.
Overall, this manuscript makes a solid and well-executed contribution to both the cryptocurrency and network security communities. With minor revisions to clarity and structure, and consideration of the suggestions above, the paper will be a valuable addition to the literature.
Comments on the Quality of English LanguageThe English could be improved to more clearly express the research.
Author Response
Comments 1:While the paper differentiates between IP spoofing and profile spoofing, a concise definition in the introduction or a dedicated subsection would help reinforce understanding for readers unfamiliar with the nuance.
.
Response:
Thank you for pointing this out. We appreciate the suggestion. We have added a clear distinction between IP spoofing and profile spoofing in the Motivation Section (Section 4) third paragraph to improve reader comprehension. The paragraph now reads:
“IP spoofing involves manipulating the IP address in packet headers to disguise the sender's true identity or location, while Profile spoofing involves creating a fake online profile or identity that mimics a real person's or entity's profile. Profile spoofing also uses IP spoofing to help hide who is really behind the fake profile.”
Comments 2: While the accuracy and F1-score are informative, presenting ROC-AUC values or precision-recall curves (especially in imbalanced cases) would further support the robustness of the classification results.
Response:
Thank you for the suggestion. We have added ROC-AUC for the three classifiers to provide a more nuanced evaluation. These results are now included in a new subsection 7.5 under Section 7 titled “ROC-AUC Analysis Across Classifiers,” with accompanying Figure 6. For example:
“Figure 6 presents the Receiver Operating Characteristic (ROC) curves for classifying the three routing types (IPv4, Tor, and I2P) using the CatBoost, Random Forest, and HistGradientBoosting classifiers. Interestingly, the ROC curves for all three classifiers are nearly identical, showing minimal visual or performance difference. Each model achieves a high true positive rate with a low false positive rate across all thresholds, indicating strong classification capability. The Area Under the Curve (AUC) values are consistently high across all routing types: 0.99 for I2P, and 0.98 for both IPv4 and Tor. This consistency suggests that the classification results are not only accurate but also robust across different machine learning models. It also highlights the effectiveness of the selected behavioral features, which enable reliable detection of spoofed anonymous routing behaviors regardless of the specific model used.”
Comments 3: A short paragraph on the potential ethical considerations (e.g., deanonymization, surveillance risks) and how such detection tools could be responsibly used in cryptocurrency systems could enrich the paper’s broader impact.
Response:
Our work promotes anonymity by detecting those networking which fails to (or intentionally) bypass anonymous routing of Tor and I2P. Deploying our work in cryptocurrency practice would advance anonymity and reduce the surveillance risks. We describe these in Sections 3 and 4; Section 3 describes the importance of anonymity and Section 4 motivates our work against profile spoofing bypassing anonymous routing.The following statement is added as the last two sentences of the third paragraph of the Introduction Section (Section 1).
“We detect and distinguish between the networking types between non-anonymous routing, Tor, or I2P using the networking behaviors.Our work therefore promotes anonymity, which is important in cryptocurrency as described in Section 3.1, by detecting profile spoofing and ensuring the anonymous networking use.”
Comments 4: There are some minor typographical and formatting issues (e.g., inconsistent spacing between sections, occasional line breaks in the middle of sentences). A final proofreading pass is recommended.
Response:
Thank you. We have completed a careful proofreading of the entire manuscript. Inconsistencies in spacing between sections, improper line breaks, and formatting artefacts have been corrected throughout the manuscript to ensure professional and consistent presentation.
Comments 5:The paper occasionally repeats points made earlier (e.g., emphasis on spoofing threats). Consider tightening the discussion for better readability.
Response:
Profile spoofing motivates our work, and we use it to clarify our AI goal. Because our AI goal differs from the others, including those previous research in the context of cryptocurrency networking (described in Section 2.1), we describe and discuss profile spoofing. We read the manuscript again to watch out for redundancy.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper shows novelty by introducing a novel application of network fingerprinting combined with machine learning supervised learning models—CatBoost, Random Forest, and HistGradientBoostingClassifier —to address spoofing vulnerabilities inherent in cryptocurrency networks. The paper demonstrates the practicality of their methodology by deploying an active Bitcoin node connected to the Mainnet, thus validating their approach in realistic settings.
Here are my review comments and suggestions:
- On page 2, there is a chapter illustrating each section. While section 6 is missing there.
- The reference is not in the correct order. For example, on page 4, the reference number jumps from [17-19] to [35]
- Baseline is a critical concept in data analytics in engineering. It would be better to study a baseline classifier such as dummy classifier as a baseline to show the benefit improvement of CatBoost, Random Forest and HistGradient from baseline.
- In page 7, it is mentioned that memory utilization is the most significant role in feature while in figure 2, it looks like it is the least significant feature among the five. Could the authors explain this?
- What is the separation among train, validate and test? How many datapoints in dataset? Where is the dataset coming from? How are they collected? It would be better to more detailed explanation of dataset and how the train, validate and test dataset created.
- The error bars are not clearly shown in figure 5
Author Response
Comment 1: On page 2, there is a chapter illustrating each section. While section 6 is missing there.
Response:
Thank you for catching this. We corrected the numbering in the manuscript. The previous section numbering inadvertently skipped Section 6 in the overview. This has been fixed, and all sections are now sequentially numbered and referenced correctly in the table of contents and main text. The following sentence has been added.
“Section 6 presents the network fingerprinting using machine learning, along with a machine learning model description.”
Reviewer Comment: The reference is not in the correct order. For example, on page 4, the reference number jumps from [17-19] to [35]
Response:
We appreciate the reviewer’s careful reading. Please note that the citations are in the correct order from the beginning of the Related Work section. The citation [17–19] on page 4 is a repetition of earlier references and not newly introduced. The content discussed here refers back to topics already covered in the Related Work section. We hope this clarification resolves any confusion.
Comment 3 :Baseline is a critical concept in data analytics in engineering. It would be better to study a baseline classifier such as dummy classifier as a baseline to show the benefit improvement of CatBoost, Random Forest and HistGradient from baseline.
Response:
Thank you for this insightful suggestion. We have added a statement with a baseline in the last line of the first paragraph of subsection 7.3 in Section 7. The following has been added:
“Without our scheme, the peer node is vulnerable against profile spoofing and would have 0% accuracy against profile-spoofing threat.”
Comment 4:In page 7, it is mentioned that memory utilization is the most significant role in feature while in figure 2, it looks like it is the least significant feature among the five. Could the authors explain this?
Response:
Thank you for this important clarification request. We revisited both the figure and discussion. The statement on page 7 has been revised for clarity:
“Although the Memory Utilization appears at the least significant among the five top features, ablation testing reveals that removing it causes the largest drop in model accuracy- 81% for IPv4, 84% for Tor and 89% for I2P. This indicates that while it ranks low in individual importance, it contributes unique information not captured by other features, making it a crucial combination.”
This explanation has been added in the last two sentences in subsection 7.1 in the revised manuscript.
Comment 5 :What is the separation among train, validate and test? How many datapoints in dataset? Where is the dataset coming from? How are they collected? It would be better to more detailed explanation of dataset and how the train, validate and test dataset created.
Response:
We agree that this section needed more detail. We’ve expanded the Data Collection part in the last paragraph in Section 5 with the following content:
“The dataset consists of 35,080 labeled peer connection samples, collected from our Bitcoin prototype running on the Mainnet. The node was configured to accept connections using IPv4, Tor and I2P protocols. We split the dataset into 70\% training,15\% validation and 15\% testing. The validation set was used for hyperparameter tuning, while the test set was used for performance reporting.”
Comment 6:The error bars are not clearly shown in figure 5.
Response:
Thank you for pointing this out. The time duration for both training and testing is very short, which causes the values to appear quite small in the figure. To improve visibility, we multiplied the testing times by a factor of 50.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper presents a strong and timely contribution to the security and anonymity of peer-to-peer cryptocurrency networks by detecting falsified claims of anonymous routing (Tor/I2P) using behavioural network fingerprinting and supervised machine learning.
The paper tackles a real and underexplored threat in cryptocurrency networking: profile spoofing of anonymous routing.
The methodology uses real-time data collected from an active Bitcoin node and applies three well-established supervised learning models (CatBoost, Random Forest, HistGradientBoosting).
The results offer high classification accuracy (93–94%) for detecting spoofed Tor/I2P usage and reveal that CatBoost is shown to be significantly faster in real-time testing, making it more practical for deployment. As a plus clearly identifies and justifies important behavioral features such as Ping RTT, bandwidths, and memory usage.
Some improvements can be made:
- The dataset is based on a single node with a maximum of 10 peers—this might limit generalization to the broader Bitcoin network, and you should include tests across multiple geographic nodes or diverse network conditions
- Though supported in theory, CJDNS is omitted due to a lack of data. This limits the completeness of the anonymous routing analysis. At least simulate or discuss how CJDNS could be analysed if it becomes more widely adopted.
- The paper implies manual labeling of the routing types (Tor/I2P/IP), but details on ensuring the correctness of these labels are minimal. In my opinion, manual labelling is subjective and may lead to simple rules that produce high accuracy.
- Figures are small, especially figure 2, which is very hard to read; you should make them bigger.
Author Response
Comment 1:The dataset is based on a single node with a maximum of 10 peers—this might limit generalization to the broader Bitcoin network, and you should include tests across multiple geographic nodes or diverse network conditions.
Response:
We appreciate highlighting this important limitation. We have revised a new Section 8 with Discussion and Future Directions, and added a clarifying in the 2nd paragraph :
“We do not anticipate the insights to change, e.g., the comparison across the ML models. Our scheme implementation builds on unmodified Bitcoin implementation in active functionalities; the additions have been for passive mechanisms for sensing, monitoring, and processing the data. However, the values, e.g., performances, may vary and the generalization validation can be helpful.”
Comment 2 : Though supported in theory, CJDNS is omitted due to a lack of data. This limits the completeness of the anonymous routing analysis. At least simulate or discuss how CJDNS could be analysed if it becomes more widely adopted.
Response:
Thank you for pointing this out. We have added a dedicated discussion on CJDNS in Section 8 (Discussion and Future Directions) in the last paragraph:
“Additionally, although CJDNS is theoretically supported in Bitcoin Core, we observed no active peers using this protocol during our observation period.As a result, CJDNS traffic could not be included in our dataset. In the future, we plan to simulate and collect real CJDNS-based peer connections to further expand the scope of anonymous routing analysis.”
Comment 3:The paper implies manual labelling of the routing types (Tor/I2P/IP), but details on ensuring the correctness of these labels are minimal. In my opinion, manual labelling is subjective and may lead to simple rules that produce high accuracy.
Response:
This is a valuable observation. We have updated Section 5 (Data Collection) in the lasttwo sentences of the second paragraph with more detailed information on how the labels were verified:
“We implement the different routing types (IPv4, Tor and I2P) and inject that traffic ourselves, therefore, we have the ground truth for each sample. Since our Bitcoin node was manually configured to use a specific routing method for each connection, we know exactly which type of network (IPv4, Tor and I2P) was used.”
Comment 4 :Figures are small, especially figure 2, which is very hard to read; you should make them bigger.
Response:
Thank you—we agree that readability is important. All figures have been resized to improve clarity, especially Figure 2 (Feature Importance). Figure 2 now uses larger font sizes for axis labels and tick marks.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for Authors(1)The paper presents a meaningful advancement in the field by proposing a behavioral machine learning-based approach for detecting anonymous routing types (IP, Tor, I2P) in cryptocurrency networks. This is a timely and relevant contribution to both cybersecurity and blockchain communities.
(2)The practical implementation on a live Bitcoin node with real-world Mainnet traffic adds empirical strength and realism to the study. Your ground-truth-aware dataset collection is particularly commendable.
(3)The usage of CatBoost, Random Forest, and HistGradientBoosting models is appropriate for this classification task. The performance comparison (in accuracy, F1, AUC, and timing) is clearly articulated.
(4)However, while you correctly omit features that are unreliable or redundant, the paper could benefit from a clearer explanation of how overfitting was avoided given the relatively small number of features and the moderate dataset size (n = 35,080).
Since you emphasize the uniqueness of your dataset, consider sharing a sanitized or simulated version (as you state in the Data Availability section) along with your feature extraction pipeline. This would enhance reproducibility and increase the paper's impact.
Comments on the Quality of English Language(1)The manuscript is generally well-organized. However, some sections (e.g., Section 7.2) include minor grammatical issues and inconsistent terminology (e.g., “CaBoost” appears instead of “CatBoost”).
(2)Some figures (e.g., Figure 2 and 3) would benefit from clearer axis labels and legends for better readability. Please ensure that all plots are legible in grayscale printing.
Author Response
Comments and Suggestions for Authors
Comment 1: The paper presents a meaningful advancement in the field by proposing a behavioral machine learning-based approach for detecting anonymous routing types (IP, Tor, I2P) in cryptocurrency networks. This is a timely and relevant contribution to both cybersecurity and blockchain communities.
Response: Thank you for recognizing the relevance and contribution of our work. We appreciate your positive assessment.
Comment 2: The practical implementation on a live Bitcoin node with real-world Mainnet traffic adds empirical strength and realism to the study. Your ground-truth-aware dataset collection is particularly commendable.
Response: We are grateful for your encouraging feedback regarding our empirical setup and data collection methodology.
Comment 3:The usage of CatBoost, Random Forest, and HistGradientBoosting models is appropriate for this classification task. The performance comparison (in accuracy, F1, AUC, and timing) is clearly articulated.
Response: We thank you for your support of our model selection and evaluation metrics.
Comment 4: However, while you correctly omit features that are unreliable or redundant, the paper could benefit from a clearer explanation of how overfitting was avoided given the relatively small number of features and the moderate dataset size (n = 35,080).
Response: We appreciate this important suggestion. In response, we have added a subsection 6.2 to explain our overfitting prevention strategies, including the following:
“To address concerns about overfitting, particularly given the moderate dataset size and limited number of features, we employed several techniques to ensure generalization and model stability. First, we applied stratified 5-fold cross-validation to maintain class distribution across folds and provide a more reliable evaluation of the model's performance. Additionally, for tree-based models such as CatBoost and HistGradientBoosting, we performed hyperparameter tuning using grid search, including adjustments to L2 regularization strength and learning rate to penalize overly complex models. Furthermore, we incorporated early stopping during training to monitor validation loss and halt training when performance no longer improved, effectively preventing overfitting to the training data. These measures collectively enhanced our model’s ability to generalize unseen data while preserving high accuracy.”
Comments 5: Since you emphasize the uniqueness of your dataset, consider sharing a sanitized or simulated version (as you state in the Data Availability section) along with your feature extraction pipeline. This would enhance reproducibility and increase the paper's impact.
Response:
Thank you for the excellent suggestion. We would consider the dataset publication. If we publish the dataset, we will sanitize the data and publish it along with supplementary documents, e.g., dataset description paper and README, to help the dataset use. As suggested, we will also cite this research paper to demonstrate the utility of the dataset.
Comments on the Quality of English Language
Comment 1: The manuscript is generally well-organized. However, some sections (e.g., Section 7.2) include minor grammatical issues and inconsistent terminology (e.g., “CaBoost” appears instead of “CatBoost”).
Response: Thank you for pointing out these issues. We have performed a thorough language and proofreading pass on the manuscript. Specifically:
- All instances of “CaBoost” were corrected to “CatBoost”.
- Section 7.2 and other sections with minor errors were grammatically revised for clarity and fluency.
Comment 2: Some figures (e.g., Figure 2 and 3) would benefit from clearer axis labels and legends for better readability. Please ensure that all plots are legible in grayscale printing.
Response: We agree with your observation. Figures 2 and 3 have been updated as follows:
- Font sizes and line widths have been increased for axis labels and legends for better readability.
Author Response File: Author Response.pdf