Uncertainty-Aware Active Meta-Learning for Few-Shot Text Classification
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors introduce a new uncertainty-based sample balanced (USB) loss function to prioritize more informative samples and ensure balanced learning. However, there is still room for improvement in terms of data and interpretation. It is suggested that the author should revise and improve according to the review opinions.
- What is your work's major contribution to the field?
- A more comprehensive comparison table should be provided with current methods.
- How about the computational cost of uncertainty estimation and active learning?
- Highlight the difficulties the authors faced while doing this work as well as its future prospects, in great detail.
- Would alternative uncertainty quantification methods, such as Bayesian Neural Networks or Deep Ensembles give similar improvements as Monte Carlo Dropout?
- What is the biggest challenge when using this novel Uncertainty-Aware Active Meta-Learning (UA-AML) methodology?
- Can authors provide a case study showing examples of selected high-uncertainty tasks and how they are different from random task sampling?
Author Response
We sincerely thank the reviewers for their thoughtful and constructive feedback. Your comments have helped us critically evaluate our work and identify areas for further improvement and exploration. Below, we provide detailed responses to each point raised.
Comment 1: What is your work's major contribution to the field?
Response 1: As mentioned in the paper (the end of introduction section), Uncertainty-Aware Active Meta-Learning (UA-AML) is a novel approach that enhances few-shot text classification by prioritizing high-uncertainty tasks and incorporating a new Uncertainty-Based Sample Balanced (USB) loss for better task generalization. The experimental results confirm improved performance in low-resource NLP scenarios, including out-of-scope intent detection. This shows how the proposed methodology can be used in low-resource scenarios.
Comment 2: A more comprehensive comparison table should be provided with current methods.
Response 2: We fully agree that a more comprehensive comparison table would strengthen the evaluation section and provide clearer insights into how UA-AML stands relative to current state-of-the-art methods. In the current version, we have compared our approach with several baselines—such as Matching Networks, Prototypical Networks, Relation Networks, SNAIL, and BERT-PAIR. Due to time constraints, we won’t be able to do a comprehensive comparison between our method and other methods. In future work, we plan to include a unified comparison table that presents all these details side-by-side to provide a clearer view of how UA-AML performs in terms of accuracy, generalization, and computational trade-offs.
Comment 3: How about the computational cost of uncertainty estimation and active learning?
Response 3: We acknowledge the importance of addressing the computational cost of uncertainty estimation and active learning. We have added a discussion on this aspect in the conclusion section.
Comment 4: Highlight the difficulties the authors faced while doing this work as well as its future prospects, in great detail.
Response 4: We faced a few issues during the development of this work. One of the primary difficulties was designing a system that could accurately estimate uncertainty in a computationally efficient way, especially given the constraints of few-shot learning in low-resource NLP tasks.
Comment 5: Would alternative uncertainty quantification methods, such as Bayesian Neural Networks or Deep Ensembles give similar improvements as Monte Carlo Dropout?
Response 5: Exploring alternative uncertainty quantification methods like Deep Ensembles is one direction for future work. While Monte Carlo Dropout was chosen for its computational efficiency, Deep Ensembles could potentially offer more robust uncertainty estimation by capturing both epistemic and aleatoric uncertainties more effectively. We will explore them in possible future work. We have added further future directions in the conclusions section to note your question.
Comment 6: What is the biggest challenge when using this novel Uncertainty-Aware Active Meta-Learning (UA-AML) methodology?
Response 6: The biggest challenge in using the Uncertainty-Aware Active Meta-Learning (UA-AML) methodology lies in balancing computational efficiency with improved task selection. While uncertainty quantification enhances meta-learning by prioritizing informative tasks, it also introduces additional computational overhead due to multiple forward passes required for uncertainty estimation (e.g., Monte Carlo Dropout). This can be particularly challenging in low-resource environments, where computational constraints limit the feasibility of running multiple inferences.
Comment 7: Can authors provide a case study showing examples of selected high-uncertainty tasks and how they are different from random task sampling?
Response 7: Providing a case study demonstrating how high-uncertainty task selection differs from random task sampling would indeed strengthen the paper. While our experimental results quantitatively show the benefits of uncertainty-aware active meta-learning (UA-AML), we acknowledge that a qualitative analysis with concrete examples would provide deeper insights. In future work, we plan to include a case study illustrating specific tasks selected based on high uncertainty, highlighting how they differ in structure and difficulty compared to randomly sampled tasks.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
Here, we have an interesting contribution to low-resource natural language understanding (NLU) and few-shot text classification. Your proposed Uncertainty-Aware Active Meta-Learning (UA-AML) framework addresses a critical gap in meta-learning methodologies, particularly in the context of NLP tasks. Below, I provide a detailed assessment of your work, highlighting its strengths and offering constructive feedback to enhance its quality and chance for impact further.
=========================
Strengths of Your Work:
Relevance and Novelty: Your research tackles a pressing challenge in NLP—low-resource NLU—and introduces an innovative meta-learning approach incorporating uncertainty quantification for task selection. This is a significant contribution, as it bridges the gap between meta-learning methodologies traditionally applied in computer vision and their adaptation to NLP tasks.
Proposed Methodology: The UA-AML framework is well-motivated and addresses key limitations in existing meta-learning approaches, such as random task selection and meta-overfitting. Introducing an uncertainty-based loss function (USB loss) and active task selection based on uncertainty quantification are particularly innovative and can potentially improve generalization in few-shot learning scenarios.
Comprehensive Evaluation: Your article demonstrates the effectiveness of UA-AML across multiple NLP tasks, including sentiment analysis, relation classification, and intent detection. This multi-task evaluation strengthens the claim that the proposed method is versatile and applicable to diverse low-resource NLU scenarios.
Objective Evaluiation: Fig4 - FAR curves
======================
Feedback and Recommendations:
While your article is compelling, there are several areas where additional clarification or improvement could enhance it:
Comparative Baselines: Your article should benefit from a more comprehensive comparison with state-of-the-art baselines in few-shot learning and low-resource NLU. For instance, how does UA-AML perform against other meta-learning frameworks like MAML or Prototypical Networks in NLP tasks? Including such comparisons would strengthen the empirical validation of your proposed method.
Expand the Evaluation: Include comparisons with state-of-the-art baselines and conduct ablation studies to isolate the impact of uncertainty quantification and task selection.
Meta-Overfitting Mitigation: While you identify meta-overfitting as a critical issue, the article does not analyze how UA-AML mitigates this problem. A more thorough discussion, supported by empirical results, would help readers understand the extent to which your method addresses this challenge.
Provide More Technical Details: Elaborate on the uncertainty quantification process, including thresholds, sensitivity analysis, and computational overhead.
Uncertainty Quantification: The methodology for calculating uncertainty (e.g., entropy or variance) is not sufficiently detailed. For example, how are the thresholds for uncertainty determined, and how sensitive is the model's performance to these thresholds? Providing more technical details and conducting ablation studies would enhance the reproducibility of your method.
Scalability and Computational Costs: While you emphasize the practicality of lightweight models for few-shot learning, the article does not discuss the computational overhead introduced by UA-AML. Given that the method involves repeated forward passes (T times) for uncertainty quantification, evaluating its scalability and efficiency is important, especially in resource-constrained environments.
Generalization to Other Domains: Your article focuses exclusively on NLP tasks. While this is a valuable contribution, it would be beneficial to discuss whether UA-AML can be extended to other domains. This would eventually highlight the broader applicability of your framework.
Ethical Considerations: The article does not address potential ethical implications of deploying UA-AML in real-world applications. For instance, how does the method handle biased or sensitive data in low-resource settings? Including a discussion on ethical considerations would make your article more comprehensive.
Discuss Broader Implications: Please explore the applicability of UA-AML to other domains and address potential ethical concerns.
Clarity of Technical Details: Some sections of the article, particularly the methodology, are dense and could benefit from clearer explanations. For example, the workflow of UA-AML (Figure 2) is mentioned but not described in sufficient detail. Providing a step-by-step breakdown of the framework would improve readability and accessibility for a broader audience.
Improve Clarity: GO into the deepest details regarding the methodology section and provide a more detailed explanation of the UA-AML workflow. For replication purposes, submitting and publishing a software library implementing the proposed method on the two main datasets you used (ARSC and FewRel) together with the current article is advisable.
Regarding the replication of your work for comparison purposes: A section dedicated to this topic must be introduced in the article. Please recommend FAR and FRR and ROC curves and EER indicators as objective ways of comparing different approaches to each other and to yours. You may also comment/explain here the reasons why Accuracy is not an objective performance criterion.
===========
Overall, your work presents a promising approach to low-resource NLU and few-shot text classification, with significant potential for impact in the field. Because of this, I recommend that it be Accepted with Minor revisions.
Thank you for your efforts, and please feel free to reach out to the Editors if you have any questions or need further clarification on my feedback.
All the best,
2025, Mar 12.
Author Response
We sincerely thank the reviewers for their thoughtful and constructive feedback. Your comments have helped us critically evaluate our work and identify areas for further improvement and exploration. Below, we provide detailed responses to each point raised.
Comment 1: Comparative Baselines: Your article should benefit from a more comprehensive comparison with state-of-the-art baselines in few-shot learning and low-resource NLU. For instance, how does UA-AML perform against other meta-learning frameworks like MAML or Prototypical Networks in NLP tasks? Including such comparisons would strengthen the empirical validation of your proposed method.
Response 1: In this paper, we primarily focused on demonstrating the effectiveness of UA-AML within the uncertainty-aware active meta-learning paradigm, and as such, our experiments centered on comparing against representative models in few-shot classification and uncertainty estimation, such as Matching Networks, Relation Networks, and MC Dropout-based methods. We agree that including comparisons with widely known meta-learning frameworks like Model-Agnostic Meta-Learning (MAML) and Prototypical Networks in NLP settings would further strengthen the empirical validation of our method. Although these models were not included in our current experiments due to time and scope constraints, we plan to incorporate them in future work to offer a more comprehensive evaluation.
Comment 2: Expand the Evaluation: Include comparisons with state-of-the-art baselines and conduct ablation studies to isolate the impact of uncertainty quantification and task selection.
Response 2: As mentioned above, we agree that including additional state-of-the-art baseline comparisons and conducting ablation studies would provide deeper insights into the contributions of uncertainty quantification and task selection. Due to time and scope limitations, these were not fully explored in the current version. However, we plan to extend this work in future research by incorporating more comprehensive evaluations and detailed ablation studies to further validate the effectiveness of each component within the UA-AML framework.
Comment 3: Meta-Overfitting Mitigation: While you identify meta-overfitting as a critical issue, the article does not analyze how UA-AML mitigates this problem. A more thorough discussion, supported by empirical results, would help readers understand the extent to which your method addresses this challenge.
Response 3: We agree that meta-overfitting is a critical challenge in few-shot learning. We have addressed this issue empirically in the experimental section while discussing the results in section 5.2. The results demonstrate that appropriately balancing uncertain (hard) and confident (easy) samples helps prevent overfitting by encouraging the model to generalize across varying levels of task difficulty.
Comment 4: Provide More Technical Details: Elaborate on the uncertainty quantification process, including thresholds, sensitivity analysis, and computational overhead.
Response 4: In response to your suggestion, we have updated Figure 2 to better illustrate the uncertainty quantification process and its integration into the UA-AML framework. Additionally, we have expanded the Methodology section to include further technical details on how uncertainty is measured using Monte Carlo Dropout. We will try to expand further in future work.
Comment 5: Uncertainty Quantification: The methodology for calculating uncertainty (e.g., entropy or variance) is not sufficiently detailed. For example, how are the thresholds for uncertainty determined, and how sensitive is the model's performance to these thresholds? Providing more technical details and conducting ablation studies would enhance the reproducibility of your method.
Response 5: In response, we have updated Figure 2 and added additional technical explanations in the Methodology section to clarify how uncertainty is quantified using Monte Carlo Dropout, including the use of entropy or variance depending on the loss type.
Comment 6: Scalability and Computational Costs: While you emphasize the practicality of lightweight models for few-shot learning, the article does not discuss the computational overhead introduced by UA-AML. Given that the method involves repeated forward passes (T times) for uncertainty quantification, evaluating its scalability and efficiency is important, especially in resource-constrained environments.
Response 6: We acknowledge the importance of discussing the computational overhead and scalability of UA-AML, particularly in resource-constrained environments. To clarify, when we referred to lightweight models, we did not mean models that are lightweight in architecture or performance, but rather those that require fewer data resources for effective training, making them suitable for low-resource NLP tasks. We have updated the introduction section and replace the usage of the term lightweight. We have also added discussions and future directions in relation to the computational overhead in the conclusion section.
Comment 7: Generalization to Other Domains: Your article focuses exclusively on NLP tasks. While this is a valuable contribution, it would be beneficial to discuss whether UA-AML can be extended to other domains. This would eventually highlight the broader applicability of your framework.
Response 7: We understand the importance of discussing the broader applicability of UA-AML beyond NLP tasks. In response, we have added additional content in the Conclusion section, where we discuss how UA-AML’s uncertainty-aware task selection strategy can be extended to other domains, such as computer vision, speech processing, and reinforcement learning. Since meta-learning is widely used in few-shot image classification, medical diagnosis, and robotics, the principles of uncertainty-based active task selection could enhance generalization in these fields as well.
Comment 8: Ethical Considerations: The article does not address potential ethical implications of deploying UA-AML in real-world applications. For instance, how does the method handle biased or sensitive data in low-resource settings? Including a discussion on ethical considerations would make your article more comprehensive.
Response 8: We recognize the need to address ethical considerations when deploying UA-AML in real-world applications, especially in low-resource settings where data biases and fairness challenges may arise. In response, we have added a discussion in the Conclusion section outlining potential ethical challenges and mitigation strategies. Specifically, we highlight how uncertainty-based task selection could inadvertently prioritize biased or underrepresented data distributions, potentially leading to skewed model outcomes.
Comment 9: Discuss Broader Implications: Please explore the applicability of UA-AML to other domains and address potential ethical concerns.
Response 9: We have expanded the Conclusion section to discuss the broader applicability of UA-AML beyond NLP, calling attention to its potential use in computer vision, speech recognition, and others where uncertainty-aware meta-learning can improve sample efficiency and generalization. Additionally, we have added more content to address potential risks, such as bias in uncertainty-based task selection and handling sensitive data in high-stakes applications.
Comment 10: Clarity of Technical Details: Some sections of the article, particularly the methodology, are dense and could benefit from clearer explanations. For example, the workflow of UA-AML (Figure 2) is mentioned but not described in sufficient detail. Providing a step-by-step breakdown of the framework would improve readability and accessibility for a broader audience.
Response 10: We understand parts of the methodology are presented in a dense manner. As a response we have expanded our description of the UA-AML workflow and provided a step-by-step breakdown of Figure 2 to enhance readability and accessibility for a broader audience. The revised section now includes a detailed explanation of each component, from uncertainty quantification using Monte Carlo Dropout (MC-Dropout) to the application of Uncertainty-Based Sample Balanced (USB) Loss in meta-learning. These additions ensure that the workflow and contributions of UA-AML are clearly articulated, making it easier for readers to understand the methodology and its significance.
Comment 11: Improve Clarity: GO into the deepest details regarding the methodology section and provide a more detailed explanation of the UA-AML workflow. For replication purposes, submitting and publishing a software library implementing the proposed method on the two main datasets you used (ARSC and FewRel) together with the current article is advisable.
Response 11: We have added additional explanation in methodology in order to further clarify the workings of the proposed model. We have also updated the figure to include much more details for the sake of the reader's understanding.
Comment 12: Regarding the replication of your work for comparison purposes: A section dedicated to this topic must be introduced in the article. Please recommend FAR and FRR and ROC curves and EER indicators as objective ways of comparing different approaches to each other and to yours. You may also comment/explain here the reasons why Accuracy is not an objective performance criterion.
Response 12: Thank you for the insightful suggestion. We fully agree that incorporating objective performance metrics such as FAR, FRR, ROC curves, and EER would enhance the rigor and replicability of our evaluation, particularly in tasks like out-of-scope intent detection where accuracy alone may not provide a complete picture. However, due to time and scope constraints, we were not able to include these additional analyses in the current version of the paper. That said, we recognize their importance and plan to address them in future research, where we aim to conduct a more thorough evaluation using these metrics to further validate and compare the effectiveness of UA-AML.
Reviewer 3 Report
Comments and Suggestions for AuthorsPlease see attached.
Comments for author File: Comments.pdf
Author Response
We sincerely thank the reviewers for their thoughtful and constructive feedback. Your comments have helped us critically evaluate our work and identify areas for further improvement and exploration. Below, we provide detailed responses to each point raised.
Comment 1: The introduction will read better by including most recent studies that highlight the current state-of-the-art in uncertainty quantification for NLP. Consider exploring the most recent ML applications in NLP that also incorporate Bayesian inference and probabilistic modeling.
Response 1: We have updated the Introduction section to include a discussion of the most recent state-of-the-art studies in uncertainty quantification (UQ) for NLP, particularly those that incorporate Bayesian inference, probabilistic modeling, and ensemble methods.
Comment 2: The methodology is well detailed, however, the choice of Monte Carlo Dropout for uncertainty estimation could be discussed in comparison to other Bayesian deep learning techniques such as Gaussian Processes and Deep Ensembles.
Response 2: Thank you for your thoughtful comment. We appreciate your suggestion to compare Monte Carlo Dropout with other Bayesian deep learning techniques such as Gaussian Processes and Deep Ensembles. Due to scope and time limitations, we were unable to include a detailed comparative analysis in the current version. However, we recognize the value of such comparisons and plan to explore them in future work to assess trade-offs in uncertainty quality, performance, and computational cost across various approaches.
Comment 3: The use of FewRel, ARSC, and CLINC150 datasets is appropriate, but additional baselines could be considered, especially recent few-shot learning models or transformer-based zero shot learning models.
Response 3: We understand that incorporating additional baselines, such as recent few-shot learning models or transformer-based zero-shot learning models, would further strengthen our evaluation. Due to timing constraints, we focused on FewRel, ARSC, and CLINC150 datasets, comparing UA-AML against widely used meta-learning and few-shot learning baselines. Future work will extend our experiments to include more recent few-shot and zero-shot learning models, allowing for a broader comparison and further validating the generalization capabilities of UA-AML.
Comment 4: The ROC-AUC curves in Figure 4 are informative; however, a comparative error analysis (misclassification rates across different NLU tasks) could add more depth.
Response 4: Thank you for your helpful comment. In response, we have added a comparative error analysis to further illustrate how UA-AML improves predictive reliability across NLU tasks. Specifically, we included the attached reliability diagrams comparing standard SNGP with UA-AML SNGP (UAR = 0.8). As shown, UA-AML demonstrates better alignment between predictive probability and actual accuracy, especially in the higher-confidence regions—indicating improved calibration and reduced misclassification in uncertain regions. We have also updated the Experimental section with this analysis and discussion to provide more depth on how UA-AML affects error distribution across tasks.
Comment 5: Ethical concerns regarding uncertainty-based decision-making should be briefly addressed, particularly in high-stakes applications like medical diagnosis.
Response 5: We acknowledge the importance of addressing ethical concerns related to uncertainty-based decision-making, especially in high-stakes applications such as medical diagnosis. In response, we have added further details and possible future directions in the Conclusion section, where we discuss potential risks associated with bias, fairness, and handling sensitive data in uncertainty-aware task selection.