A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning
Abstract
1. Introduction
- First, we propose a novel multi-agent pheromone-guided Monte Carlo tree search algorithm that utilizes global historical information to improve exploration in reasoning spaces.
- Second, we integrate pheromone weighting into the UCB-based selection process, enabling a balanced trade-off between exploiting successful reasoning paths and exploring new ones based on historical performance.
- Third, we introduce a dynamic pheromone update mechanism guided by LLM feedback into backpropagation progress, yielding an adaptive optimization scheme that improves both robustness and search convergence, particularly on tasks requiring deep and complex reasoning chains.
2. Related Work
2.1. Single-Step Reasoning Paradigms
2.2. Multi-Step Reasoning Paradigms
3. Methodology
3.1. Problem Formulation as a Markov Decision Process (MDP)
- State Space (): A state at any step t represents the complete history of the reasoning process. It is defined by the initial chart–question pair and the sequence of reasoning steps (actions) generated thus far, i.e., .
- Action Space (): An action is the generation of the next intermediate reasoning step. This corresponds to a language model from a cooperative set producing a subsequent sentence that extends the current reasoning trajectory from state .
- Transition Probability (P): The transition function defines the probability of moving from state to state after taking action . This function is implicitly defined by the stochastic nature of the language models, and our agent samples from it by invoking a model .
- Reward Function (): Rewards are sparse and assigned only at the terminal state of a completed trajectory . Let be the final answer generated by the trajectory. The terminal reward is a binary score based on the semantic similarity, sim, between the generated answer and the ground-truth answer A. The reward function is formally defined using the indicator function as:
- Discount Factor (): We set the discount factor , as the primary reward is terminal.
3.2. A Dual-Signal MCTS Algorithm for Solving the MDP
3.2.1. Selection: Hybrid Policy Guided by Local and Global Signals
- The Local Signal (UCB Score) is the part of the formula that balances two competing desires:
- −
- Exploitation (): This term represents the average reward of taking action a.
- −
- Exploration (): This term gives a bonus to actions that have been tried less often. is the visit count of the parent state, is the visit count of the action, and c is the exploration constant.
- The Global Signal, , is our ACO-inspired pheromone value, representing the historical desirability of taking action a in state s.
- The Balancing Factor () is a hyperparameter that controls the influence of the global pheromone signal.
3.2.2. Expansion and Simulation with Multi-Agent Collaboration
3.2.3. Backpropagation: Unified Update of Local and Global Values
- Local Value Update (MCTS): The standard MCTS update is performed. First, the visit count is incremented:Then, the mean action value is updated with the new reward :
- Global Pheromone Update (ACO): Crucially, all agents share and contribute to the same global pheromone map. Concurrently, the global pheromone trail is updated for all pairs on the trajectory , regardless of which agent generated that specific step. This follows the standard ACO reinforcement-and-evaporation rule:
Algorithm 1 Pheromone-guided MCTS for Solving the MDP |
|
4. Experiments
4.1. Implementation Details
4.2. Main Results
4.3. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huang, K.H.; Chan, H.P.; Fung, Y.R.; Qiu, H.; Zhou, M.; Joty, S.; Chang, S.F.; Ji, H. From pixels to insights: A survey on automatic chart understanding in the era of Large Foundation Models. arXiv 2024, arXiv:2403.12027. [Google Scholar] [CrossRef]
- Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Wang, P.; Bai, S.; Tan, S.; Wang, S.; Fan, Z.; Bai, J.; Chen, K.; Liu, X.; Wang, J.; Ge, W. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv 2024, arXiv:2409.12191. [Google Scholar]
- Zhang, D.; Wu, J.; Lei, J.; Che, T.; Li, J.; Xie, T.; Huang, X.; Zhang, S.; Pavone, M.; Li, Y. LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Albuquerque, NM, USA, 29 April–4 May 2025; pp. 7315–7337. [Google Scholar]
- Huang, X.; Li, S.; Qu, W.; He, T.; Zuo, Y.; Ouyang, W. Frozen CLIP model is efficient point cloud backbone. arXiv 2022, arXiv:2212.04098. [Google Scholar]
- Huang, X.; Huang, Z.; Li, S.; Qu, W.; He, T.; Hou, Y.; Zuo, Y.; Ouyang, W. Frozen CLIP transformer is an efficient point cloud encoder. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–28 February 2024; Volume 38, pp. 2382–2390. [Google Scholar]
- Huang, X.; Huang, Z.; Zuo, Y.; Gong, Y.; Zhang, C.; Liu, D.; Fang, Y. PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration. arXiv 2025, arXiv:2501.07762. [Google Scholar] [CrossRef]
- Zhang, J.; Huang, J.; Jin, S.; Lu, S. Vision-language models for vision tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5625–5644. [Google Scholar] [CrossRef]
- Li, C.; Wong, C.; Zhang, S.; Usuyama, N.; Liu, H.; Yang, J.; Naumann, T.; Poon, H.; Gao, J. LLaVA-med: Training a large language-and-vision assistant for biomedicine in one day. Adv. Neural Inf. Process. Syst. 2023, 36, 28541–28564. [Google Scholar]
- Zhang, H.; Chen, J.; Jiang, F.; Yu, F.; Chen, Z.; Li, J.; Chen, G.; Wu, X.; Zhang, Z.; Xiao, Q. HuatuoGPT, towards taming language model to be a doctor. arXiv 2023, arXiv:2305.15075. [Google Scholar] [CrossRef]
- Chen, J.; Cai, Z.; Ji, K.; Wang, X.; Liu, W.; Wang, R.; Hou, J.; Wang, B. HuatuoGPT-O1, towards medical complex reasoning with LLMs. arXiv 2024, arXiv:2412.18925. [Google Scholar]
- Masry, A.; Long, D.X.; Tan, J.Q.; Joty, S.; Hoque, E. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. arXiv 2022, arXiv:2203.10244. [Google Scholar] [CrossRef]
- Wang, Z.; Xia, M.; He, L.; Chen, H.; Liu, Y.; Zhu, R.; Liang, K.; Wu, X.; Liu, H.; Malladi, S. CharXIV: Charting gaps in realistic chart understanding in multimodal LLMs. Adv. Neural Inf. Process. Syst. 2024, 37, 113569–113697. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtually, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Kim, W.; Son, B.; Kim, I. ViLT: Vision-and-Language Transformer without convolution or region supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtually, 18–24 July 2021; pp. 5583–5594. [Google Scholar]
- Sprague, Z.; Yin, F.; Rodriguez, J.D.; Jiang, D.; Wadhwa, M.; Singhal, P.; Zhao, X.; Ye, X.; Mahowald, K.; Durrett, G. To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning. arXiv 2024, arXiv:2409.12183. [Google Scholar]
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 2023, 36, 11809–11822. [Google Scholar]
- Zhang, D.; Zhoubian, S.; Hu, Z.; Yue, Y.; Dong, Y.; Tang, J. Rest-MCTS*: LLM self-training via process reward guided tree search. Adv. Neural Inf. Process. Syst. 2024, 37, 64735–64772. [Google Scholar]
- Zhao, Y.; Yin, H.; Zeng, B.; Wang, H.; Shi, T.; Lyu, C.; Wang, L.; Luo, W.; Zhang, K. Marco-O1: Towards open reasoning models for open-ended solutions. arXiv 2024, arXiv:2411.14405. [Google Scholar]
- Yao, H.; Huang, J.; Wu, W.; Zhang, J.; Wang, Y.; Liu, S.; Wang, Y.; Song, Y.; Feng, H.; Shen, L. Mulberry: Empowering MLLM with O1-like reasoning and reflection via collective Monte Carlo Tree Search. arXiv 2024, arXiv:2412.18319. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Gan, B.; Zhao, Y.; Zhang, T.; Huang, J.; Li, Y.; Teo, S.X.; Zhang, C.; Shi, W. MASTER: A Multi-Agent System with LLM Specialized MCTS. arXiv 2025, arXiv:2501.14304. [Google Scholar]
- Zhang, Y.; Mao, S.; Ge, T.; Wang, X.; de Wynter, A.; Xia, Y.; Wu, W.; Song, T.; Lan, M.; Wei, F. LLM as a mastermind: A survey of strategic reasoning with large language models. arXiv 2024, arXiv:2404.01230. [Google Scholar] [CrossRef]
- Peng, B.; Galley, M.; He, P.; Cheng, H.; Xie, Y.; Hu, Y.; Huang, Q.; Liden, L.; Yu, Z.; Chen, W. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv 2023, arXiv:2302.12813. [Google Scholar] [CrossRef]
- Zelikman, E.; Wu, Y.; Mu, J.; Goodman, N. STaR: Bootstrapping reasoning with reasoning. Adv. Neural Inf. Process. Syst. 2022, 35, 15476–15488. [Google Scholar]
- Blum, C. Ant colony optimization: Introduction and recent trends. Phys. Life Rev. 2005, 2, 353–373. [Google Scholar] [CrossRef]
- Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2007, 1, 28–39. [Google Scholar] [CrossRef]
- Dorigo, M.; Stützle, T. Ant colony optimization: Overview and recent advances. In Handbook of Metaheuristics; Springer: Berlin/Heidelberg, Germany, 2018; pp. 311–351. [Google Scholar]
- Tolpin, D.; Shimony, S. MCTS based on simple regret. In Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; Volume 26, pp. 570–576. [Google Scholar]
- Lu, P.; Bansal, H.; Xia, T.; Liu, J.; Li, C.; Hajishirzi, H.; Cheng, H.; Chang, K.W.; Galley, M.; Gao, J. MathVista: Evaluating mathematical reasoning of foundation models in visual contexts. arXiv 2023, arXiv:2310.02255. [Google Scholar]
- Roberts, J.; Han, K.; Albanie, S. GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models. arXiv 2024, arXiv:2408.11817. [Google Scholar] [CrossRef]
- Xia, R.; Zhang, B.; Ye, H.; Yan, X.; Liu, Q.; Zhou, H.; Chen, Z.; Ye, P.; Dou, M.; Shi, B. ChartX & ChartVLM: A versatile benchmark and foundation model for complicated chart reasoning. arXiv 2024, arXiv:2402.12185. [Google Scholar]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 19730–19742. [Google Scholar]
- Chaudhry, R.; Shekhar, S.; Gupta, U.; Maneriker, P.; Bansal, P.; Joshi, A. Leaf-QA: Locate, encode & attend for figure question answering. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Aspen, CO, USA, 1–5 May 2020; pp. 3512–3521. [Google Scholar]
- Singh, H.; Shekhar, S. STL-CQA: Structure-based transformers with localization and encoding for chart question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3275–3284. [Google Scholar]
- Huang, S.; Dong, L.; Wang, W.; Hao, Y.; Singhal, S.; Ma, S.; Lv, T.; Cui, L.; Mohammed, O.K.; Patra, B. Language is not all you need: Aligning perception with language models. Adv. Neural Inf. Process. Syst. 2023, 36, 72096–72109. [Google Scholar]
- Kafle, K.; Price, B.; Cohen, S.; Kanan, C. DVQA: Understanding data visualizations via question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5648–5656. [Google Scholar]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 2023, 36, 34892–34916. [Google Scholar]
- Du, Y.; Liu, Z.; Li, J.; Zhao, W.X. A survey of vision-language pre-trained models. arXiv 2022, arXiv:2202.10936. [Google Scholar]
- Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
- Roumeliotis, K.I.; Tselikas, N.D. ChatGPT and open-ai models: A preliminary review. Future Internet 2023, 15, 192. [Google Scholar] [CrossRef]
- Xu, F.; Hao, Q.; Zong, Z.; Wang, J.; Zhang, Y.; Wang, J.; Lan, X.; Gong, J.; Ouyang, T.; Meng, F. Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models. arXiv 2025, arXiv:2501.09686. [Google Scholar]
- Jiao, F.; Qin, C.; Liu, Z.; Chen, N.F.; Joty, S. Learning planning-based reasoning by trajectories collection and process reward synthesizing. arXiv 2024, arXiv:2402.00658. [Google Scholar] [CrossRef]
- OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Hsieh, C.Y.; Chen, S.A.; Li, C.L.; Fujii, Y.; Ratner, A.; Lee, C.Y.; Krishna, R.; Pfister, T. Tool documentation enables zero-shot tool-usage with large language models. arXiv 2023, arXiv:2308.00675. [Google Scholar]
- Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q. Least-to-most prompting enables complex reasoning in large language models. arXiv 2022, arXiv:2205.10625. [Google Scholar]
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-consistency improves chain of thought reasoning in language models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
- Chari, A.; Tiwari, A.; Lian, R.; Reddy, S.; Zhou, B. Pheromone-based learning of optimal reasoning paths. arXiv 2025, arXiv:2501.19278. [Google Scholar]
Dataset | Questions | Chart Types | Question Types |
---|---|---|---|
ChartQA [13] | 2500 | Bar Charts, Line Charts, Pie Charts | Data Retrieval, Visual, Compositional, Both Visual and Compositional |
MathVista [31] | 1000 | Bar Charts, Line Plots, Pie Charts, Scatter Plots, Function Plots, Scientific Figures, Tables | FQA, GPS, MWP, TQA, VQA a |
GRAB [32] | 2170 | Mathematical Graphs | Properties, Functions, Series, Transforms |
ChartX [33] | 1150 | Domain-specific Chart Types, General Chart Types, Fine-grained Chart Types | Perception Tasks, Cognition Tasks |
Dataset | Pheromone_MCTS | MulBerry | Qwen2VL_7B |
---|---|---|---|
ChartQA_test_human | 79.68% | 78.16% | – |
ChartQA_test_augment | 89.92% | 90.88% | – |
ChartQA_test | 84.80% | 84.52% | 83.00% |
MathVista | 60.20% | 63.10% | 58.20% |
GRAB | 12.40% | – | 10.18% |
ChartX | 37.41% | – | 28.39% |
Experiment | Succ. (%) | Diff (%) | Time (s) | Time Diff (s) |
---|---|---|---|---|
Pheromone_MCTS | 71.2 | 0.0 | 118.2 | 0.0 |
10_iterations | 45.2 | −26.0 | 108.4 | −9.8 |
20_iterations | 76.8 | +5.6 | 129.7 | +11.5 |
Experiment | Succ. (%) | Diff (%) | Time (s) | Time Diff (s) |
---|---|---|---|---|
Pheromone_MCTS | 71.2 | 0.0 | 117.6 | 0.0 |
Single_model_7b | 52.8 | −18.4 | 125.1 | +7.5 |
Single_model_llama | 58.4 | −12.8 | 132.1 | +14.5 |
Experiment | Succ. (%) | Diff (%) | Time (s) | Time Diff (s) |
---|---|---|---|---|
Pheromone_MCTS | 71.2 | 0.0 | 117.6 | 0.0 |
no_pheromone | 63.6 | −7.6 | 123.1 | +5.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, M.; Qi, Z.; Zhu, T.; Vijg, J.; Huang, X. A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning. Mathematics 2025, 13, 2739. https://doi.org/10.3390/math13172739
Zhou M, Qi Z, Zhu T, Vijg J, Huang X. A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning. Mathematics. 2025; 13(17):2739. https://doi.org/10.3390/math13172739
Chicago/Turabian StyleZhou, Min, Zhiheng Qi, Tianlin Zhu, Jan Vijg, and Xiaoshui Huang. 2025. "A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning" Mathematics 13, no. 17: 2739. https://doi.org/10.3390/math13172739
APA StyleZhou, M., Qi, Z., Zhu, T., Vijg, J., & Huang, X. (2025). A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning. Mathematics, 13(17), 2739. https://doi.org/10.3390/math13172739