Abstract
Discrete prompts are the main method for interacting with Large Language Models (LLMs) due to their interpretability and cross-model compatibility. However, optimizing them for fine-grained tasks such as Aspect-Based Sentiment Analysis (ABSA) remains challenging, particularly due to error propagation from fixed prediction orders. This problem comes from two issues: errors that cascade in the sequence and the need for intensive human involvement in the prompt design. To solve these problems, we present LM-SODP, a Reinforcement Learning (RL) framework that automatically finds a better discrete prompt and decides a better order to make predictions for ABSA. Our method is based on a distilled GPT-2. It improves how the model uses task-specific information and reduces uncertainty by optimizing the prompts. This reduces the output entropy. LM-SODP also independently finds a better execution sequence for the subtasks in ABSA. Experiments on public datasets show that our method leads to stable improvements under different conditions. By using the optimized prompts, LM-SODP can effectively guide LMs with limited computational resources. It also maintains good performance across different domains and opens new avenues for automated prompt token generation.