LACX: Locality-Aware Shared Data Migration in NUMA + CXL Tiered Memory
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper proposes LACX, a novel technique for migrating shared data in NUMA+CXL tiered memory environments. This approach enhances overall memory bandwidth utilization and system performance in memory-intensive, multi-threaded workloads by overcoming limitations of traditional NUMA policies and optimizing memory management in heterogeneous architectures. Here are some points to improve the manuscript quality:
-
To what extent can the results obtained using the specific hardware configurations (CXL emulated via virtual machines and selected benchmarks) be generalized to other architectures or real-world data center environments?
-
What is the detailed impact of migration overhead in highly dynamic workloads or scenarios with rapidly changing data sharing patterns? How does LACX scale with an increasing number of nodes and threads?
-
How does LACX integrate with other memory optimization techniques such as compression, caching, or hardware-managed tiered memory policies? Are there potential conflicts or synergies that should be explored?
-
The paper mentions future work on dynamically adjusting LACX based on system conditions. What specific adaptive strategies could be implemented, and how might their effectiveness be evaluated?
-
Given that LACX may degrade performance in low-sharing or single workload scenarios, what fallback mechanisms or automatic detection methods could be incorporated to mitigate negative effects?
-
I reckon that this approach would be valuable in the field of In-memory Computing (IMC) It would be thus appreciated if the authors could briefly comment possible use of LACX in the field of IMC based on non-volatile memories. Some useful references to consider include:
[1] DOI: 10.1109/VLSI-TSA/VLSI-DAT57221.2023.10134272;
[2] DOI: 10.1109/TCSII.2023.3340112.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors
Good job, your research is good, your proposal has been validated using equipments with high performances, using experimental results from both single-workload and multi-workload scenarios to analyze the impact of memory usage
Here are my recommendations, some complementary aspects :
1.- Section 1.- Highlight the contributions of your research.
2.- In the Section 2, insert a figure, explaining the NUMA and autoNUMA architecture.
3.- In the Section 4, you must detail the experiment, there are information, but you could indicate how did you run the evaluations.
4.- Improve the table 2, is a little confusing, the same items in rows and columns.
5.- The Conclusions are according to the proposal.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper focuses on the optimization of shared data migration in NUMA+CXL hierarchical memory architecture, proposing LACX technology to improve memory bandwidth utilization and system performance. The research direction is in line with the practical needs of high-performance computing and large-scale data processing scenarios. The experimental design has a certain degree of systematicity, but there are still shortcomings.
- The shared data recognition mechanism lacks technical details: the kernel structure and specific recognition algorithm used are not explained in detail, which affects the reproducibility of the method.
- Insufficient comprehensive experimental evaluation: Benchmarking is limited to memory intensive workloads and lacks validation in diverse application scenarios.
- Insufficient depth of performance analysis: Lack of micro performance indicators and detailed cost decomposition, unable to explain the fundamental reasons for performance changes.
- More lastest research should be considered for related work. e.g. “Video saliency prediction via single feature enhancement and temporal recurrence”.
- Lack of theoretical support: No performance prediction model and complexity analysis have been established, making it difficult to guide parameter tuning and evaluate scalability.
- Improper selection of comparison method: Only compared with AutoNUMA, without sufficient comparison with advanced layered memory management technologies such as TPP.
- Expand the evaluation and add comparisons with other schemes.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI would like to congratulate with the Authors for having carefully addressed all the raised points. However, I would suggest to explicitly include in the manuscript the answer to question 6 of previous report including the corresponding citations.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have carefully revised their manuscript according to my comments and suggestions. However, there are still the following issues:
1、The format of the article is confused, which affects the reading experience. At the same time, there are some grammatical problems.
2、The author does not clearly explain the motivation behind this manuscript. What are the existing problems and why they are crucial should be explained in more detail. I suggest the author further strengthen the relevant parts of the Introduction.
3、More lastest research should be considered for related work. e.g. “Video saliency prediction via single feature enhancement and temporal recurrence”.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
