MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction
Abstract
1. Introduction
- Construction of StrTab-QA, a large-scale multimodal table reasoning dataset consisting of question-answering pairs, negative samples, and reflection data. Unlike existing multimodal table datasets such as TABMWP, MMTab, and SynTab, which mainly focus on positive question-answering supervision, StrTab-QA incorporates three complementary components: question-answering samples for reasoning learning, negative samples containing rule-based and model-generated erroneous responses to expose diverse error patterns, and reflection data providing structured verification and correction supervision. This multi-component design provides richer supervision signals for training reflection-aware multimodal reasoning models beyond datasets relying solely on positive QA pairs.
- Proposal of a reflective self-correction mechanism based on a dual-LoRA architecture. Different from existing self-correction frameworks such as Reflexion and Self-Refine, which mainly rely on inference-time prompting and external feedback generation without updating model parameters, MMTR learns verification and correction capabilities through dedicated reflection samples. Specifically, the Strategy LoRA is responsible for generating structured reasoning processes, while the Reflection LoRA learns both verification and correction behaviors. This mechanism provides supporting evidence for correct reasoning outputs and performs error localization, diagnosis, and correction when incorrect reasoning is identified, thereby enabling table-specific closed-loop “reasoning–reflection–correction” capability within a single lightweight backbone.
- Design of a progressive fine-tuning strategy and adaptive visual encoding scheme. Different from existing multimodal table reasoning models such as Table-LLaVA and SynTab-LLaVA, which mainly adopt conventional alignment and instruction-tuning pipelines, MMTR introduces a four-stage progressive training strategy. Specifically, this strategy separates cross-modal alignment, visual encoder specialization, Q-A reasoning, and reflective verification into different training stages. These stages are progressively organized to enable the model to gradually acquire reliable reasoning and reflection abilities. Meanwhile, the adaptive visual encoding scheme preserves the inherent structural information of table images during visual encoding and reduces information distortion. Extensive experiments show that MMTR achieves strong performance on the main table reasoning benchmark. It outperforms strong multimodal baselines with much larger model sizes. In addition, it shows consistent improvements under zero-shot cross-task and cross-instruction settings. This suggests that the learned reflection mechanism has good transferability.
2. Related Works
2.1. Multimodal Large Language Model
2.2. Table Question Answering
2.3. Model Reasoning
3. Methodology
3.1. StrTab-QA Construction
3.1.1. Q-A Dataset Construction
3.1.2. Negative Sample Dataset Construction
3.1.3. Reflection Dataset Construction
3.2. MMTR Architecture
3.2.1. Image–Text Processing
3.2.2. Reasoning Strategy
3.3. Progressive Reasoning-to-Reflection Fine-Tuning Strategy
3.3.1. Stage 1: Initial Alignment of the Vision–Language Projector
3.3.2. Stage 2: Table-Specific Adaptation of the Visual Encoder
3.3.3. Stage 3: Question Answering Reasoning Module
3.3.4. Stage 4: Reflective Verification and Correction Module
- Verification mode: When the candidate answer is correct, the model is supervised using the logical justification and key evidence provided in the dataset. For example, for a problem that computes total price from unit price and quantity, the supervision signal is: “The solution is correct. The key evidence is that the unit price and quantity are accurately read from the second row of the table, and the multiplication is performed correctly. The result is consistent with the data.”
- Correction mode: When the candidate answer is incorrect, the model is supervised using the error diagnosis and corrected reasoning provided in the dataset. For example, when an error occurs because the model misreads the start-time and end-time columns and thus computes the activity duration incorrectly, the supervision signal is: “The error occurs at step X. The error type is column misreading. The value from the end-time column is mistakenly used as the start time. The correct procedure is to read the value from the start-time column, followed by the corrected reasoning process and answer.”
4. Experiments
4.1. Model Configuration
4.2. Datasets
4.3. Evaluation Metrics
4.4. Baseline Models
4.5. Main Experimental Analysis
4.6. Ablation Studies
4.6.1. Ablation Study on the Progressive Reasoning-to-Reflection Fine-Tuning Strategy
4.6.2. Comparison with Prompt-Based Self-Correction
4.6.3. Reflection Behavior Analysis
4.6.4. Reflection Data Scaling Analysis
4.6.5. Ablation Study on Image Preprocessing
4.6.6. Ablation Study on Instruction Generalization
4.7. Generalization Experiments
4.8. Stability Evaluation
4.9. Statistical Significance of the Reflection Module
4.10. Failure Mode Analysis
4.11. Inference Overhead and Cost-Performance Trade-Off
4.12. Case Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Deng, X.; Sun, H.; Lees, A.; Wu, Y.; Yu, C. Turl: Table understanding through representation learning. ACM SIGMOD Rec. 2022, 51, 33–40. [Google Scholar] [CrossRef]
- Pasupat, P.; Liang, P. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); Association for Computational Linguistics: Beijing, China, 2015; pp. 1470–1480. [Google Scholar]
- Zhang, T.; Yue, X.; Li, Y.; Sun, H. Tablellama: Towards open large generalist models for tables. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Association for Computational Linguistics: Mexico City, Mexico, 2024; pp. 6024–6044. [Google Scholar]
- Zhong, X.; ShafieiBavani, E.; Jimeno Yepes, A. Image-based table recognition: Data, model, and evaluation. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 564–580. [Google Scholar]
- Mathur, S.V.; Bafna, J.S.; Kartik, K.; Khandelwal, H.; Shrivastava, M.; Gupta, V.; Bansal, M.; Roth, D. Knowledge-aware reasoning over multimodal semi-structured tables. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 14054–14073. [Google Scholar]
- Kim, Y.; Yim, M.; Song, K.Y. Tablevqa-bench: A visual question answering benchmark on multiple table domains. arXiv 2024, arXiv:2404.19205. [Google Scholar] [CrossRef]
- Yang, B.; Zhang, Y.; Liu, D.; Freitas, A.; Lin, C. Does table source matter? benchmarking and improving multimodal scientific table understanding and reasoning. arXiv 2025, arXiv:2501.13042. [Google Scholar] [CrossRef]
- Lu, P.; Qiu, L.; Chang, K.W.; Wu, Y.N.; Zhu, S.C.; Rajpurohit, T.; Clark, P.; Kalyan, A. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv 2022, arXiv:2209.14610. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 2023, 36, 34892–34916. [Google Scholar] [CrossRef]
- Chen, W.; Chang, M.W.; Schlinger, E.; Wang, W.; Cohen, W.W. Open question answering over tables and text. arXiv 2020, arXiv:2010.10439. [Google Scholar]
- Jin, N.; Siebert, J.; Li, D.; Chen, Q. A survey on table question answering: Recent advances. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing; Springer: Berlin/Heidelberg, Germany, 2022; pp. 174–186. [Google Scholar]
- Talmor, A.; Yoran, O.; Catav, A.; Lahav, D.; Wang, Y.; Asai, A.; Ilharco, G.; Hajishirzi, H.; Berant, J. Multimodalqa: Complex question answering over text, tables and images. arXiv 2021, arXiv:2104.06039. [Google Scholar]
- Yin, S.; Fu, C.; Zhao, S.; Li, K.; Sun, X.; Xu, T.; Chen, E. A survey on multimodal large language models. Natl. Sci. Rev. 2024, 11, nwae403. [Google Scholar] [CrossRef] [PubMed]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar] [CrossRef]
- Bai, S.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; Song, S.; Dang, K.; Wang, P.; Wang, S.; Tang, J.; et al. Qwen2.5-VL Technical Report. arXiv 2025, arXiv:2502.13923. [Google Scholar] [CrossRef]
- Chen, Z.; Wang, W.; Cao, Y.; Liu, Y.; Gao, Z.; Cui, E.; Zhu, J.; Ye, S.; Tian, H.; Liu, Z.; et al. Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling. arXiv 2024, arXiv:2412.05271. [Google Scholar]
- Li, F.; Zhang, R.; Zhang, H.; Zhang, Y.; Li, B.; Li, W.; Ma, Z.; Li, C. Llava-next-interleave: Tackling multi-image, video, and 3D in large multimodal models. arXiv 2024, arXiv:2407.07895. [Google Scholar]
- Kim, G.; Hong, T.; Yim, M.; Park, J.; Yim, J.; Hwang, W.; Yun, S.; Han, D.; Park, S. Donut: Document understanding transformer without ocr. arXiv 2021, arXiv:2111.15664. [Google Scholar]
- Lee, K.; Joshi, M.; Turc, I.R.; Hu, H.; Liu, F.; Eisenschlos, J.M.; Khandelwal, U.; Shaw, P.; Chang, M.W.; Toutanova, K. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In Proceedings of the International Conference on Machine Learning; PMLR: Honolulu, HI, USA, 2023; pp. 18893–18912. [Google Scholar]
- Yang, Z.; Chen, L.; Cohan, A.; Zhao, Y. Table-r1: Inference-time scaling for table reasoning tasks. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Suzhou, China, 2025; pp. 20616–20635. [Google Scholar]
- Herzig, J.; Nowak, P.K.; Müller, T.; Piccinno, F.; Eisenschlos, J. TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Beijing, China, 2020; pp. 4320–4333. [Google Scholar]
- Zeng, J.; Wu, Z.; Zheng, R.; Xue, W.; Wang, C.; Yu, X.; Zhang, T.; Yuan, S.; Zhu, T. M-TBQA: Multimodal table-based question answering. In Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application; Association for Computing Machinery: New York, NY, USA, 2023; pp. 227–231. [Google Scholar]
- Zheng, M.; Feng, X.; Si, Q.; She, Q.; Lin, Z.; Jiang, W.; Wang, W. Multimodal table understanding. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 9102–9124. [Google Scholar]
- Zhu, F.; Lei, W.; Huang, Y.; Wang, C.; Zhang, S.; Lv, J.; Feng, F.; Chua, T.S. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); Association for Computational Linguistics: Beijing, China, 2021; pp. 3277–3287. [Google Scholar]
- Jauhar, S.K.; Turney, P.; Hovy, E. Tabmcq: A dataset of general knowledge tables and multiple-choice questions. arXiv 2016, arXiv:1602.03960. [Google Scholar]
- Chen, W.; Wang, H.; Chen, J.; Zhang, Y.; Wang, H.; Li, S.; Zhou, X.; Wang, W.Y. Tabfact: A large-scale dataset for table-based fact verification. arXiv 2019, arXiv:1909.02164. [Google Scholar]
- Akhtar, M.; Cocarascu, O.; Simperl, E. PubHealthTab: A public health table-based dataset for evidence-based fact checking. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022; Association for Computational Linguistics: Seattle, WA, USA, 2022; pp. 1–16. [Google Scholar]
- Zhou, B.; Gao, Z.; Wang, Z.; Zhang, B.; Wang, Y.; Chen, Z.; Xie, H. Syntab-llava: Enhancing multimodal table understanding with decoupled synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2025; pp. 24796–24806. [Google Scholar]
- Jaech, A.; Kalai, A.; Lerer, A.; Richardson, A.; El-Kishky, A.; Low, A.; Helyar, A.; Madry, A.; Beutel, A.; Carney, A.; et al. Openai o1 system card. arXiv 2024, arXiv:2412.16720. [Google Scholar] [CrossRef]
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 2023, 36, 11809–11822. [Google Scholar] [CrossRef]
- Shinn, N.; Cassano, F.; Gopinath, A.; Narasimhan, K.; Yao, S. Reflexion: Language agents with verbal reinforcement learning. Adv. Neural Inf. Process. Syst. 2023, 36, 8634–8652. [Google Scholar] [CrossRef]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-refine: Iterative refinement with self-feedback. Adv. Neural Inf. Process. Syst. 2023, 36, 46534–46594. [Google Scholar] [CrossRef]
- Zhou, W.; Mesgar, M.; Adel, H.; Friedrich, A. RITT: A Retrieval-assisted framework with Image and Text Table representations for table question answering. In Proceedings of the 4th Table Representation Learning Workshop; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 86–97. [Google Scholar]
- Zhao, W.; Feng, H.; Liu, Q.; Tang, J.; Wei, S.; Wu, B.; Liao, L.; Ye, Y.; Liu, H.; Zhou, W.; et al. Tabpedia: Towards comprehensive visual table understanding with concept synergy. Adv. Neural Inf. Process. Syst. 2024, 37, 7185–7212. [Google Scholar] [CrossRef]
- Park, H.; Lee, J.; Oh, H. Fintab-llava: Finance domain-specific table understanding multimodal llm using fintmd. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2025; pp. 235–246. [Google Scholar]
- Zha, Z.; Qi, P.; Bao, X.; Tian, M.; Qin, B. M 3 TQA: Multi-View, Multi-Hop and Multi-Stage Reasoning for Temporal Question Answering. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2024; pp. 10086–10090. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2022, arXiv:2106.09685. [Google Scholar]
- Li, Z.; Yang, B.; Liu, Q.; Ma, Z.; Zhang, S.; Yang, J.; Sun, Y.; Liu, Y.; Bai, X. Monkey: Image resolution and text label are important things for large multi-modal models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 26763–26773. [Google Scholar]
- Lu, S.; Li, Y.; Chen, Q.G.; Xu, Z.; Luo, W.; Zhang, K.; Ye, H.J. Ovis: Structural embedding alignment for multimodal large language model. arXiv 2024, arXiv:2405.20797. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, H.; Li, X.; Xiong, Q.; Yang, X.; Gu, Y.; Yan, Y.; Shi, Q.; Li, F.; Yu, G.; et al. Hippo: Enhancing the table understanding capability of large language models through hybrid-modal preference optimization. arXiv 2025, arXiv:2502.17315. [Google Scholar] [CrossRef]
- Yao, Y.; Yu, T.; Zhang, A.; Wang, C.; Cui, J.; Zhu, H.; Cai, T.; Li, H.; Zhao, W.; He, Z.; et al. Minicpm-v: A gpt-4v level mllm on your phone. arXiv 2024, arXiv:2408.01800. [Google Scholar] [CrossRef]





| Method | Reflection Mechanism | Dedicated Reflection Training | Multimodal Table Reasoning | Learnable Reflection |
|---|---|---|---|---|
| Reflexion | Prompt-based | × | × | × |
| Self-Refine | Prompt-based | × | × | × |
| Table-LLaVA | None | × | ✓ | × |
| SynTab-LLaVA | None | × | ✓ | × |
| Table-R1 | Prompt-based | × | ✓ | × |
| MMTR (Ours) | Reflection LoRA | ✓ | ✓ | ✓ |
| Dataset | QA Samples | Negative Samples | Reflection Samples |
|---|---|---|---|
| TABMWP | ✓ | × | × |
| SynTab | ✓ | × | × |
| MMTab | ✓ | × | × |
| StrTab-QA | ✓ | ✓ | ✓ |
| Model | LLM | Size | Type | All | ||||
|---|---|---|---|---|---|---|---|---|
| Ordinary | Price List | Stem-and-Leaf Plot | Schedule | Supply and Demand Schedule | ||||
| Open-source MLLM | ||||||||
| Qwen2.5-VL | Qwen2.5 | 3B | 93.82 | 97.40 | 55.69 | 86.59 | 65.81 | 88.16 |
| Monkey [40] | Qwen | 7B | 44.80 | 26.70 | 28.00 | 69.10 | 56.41 | 39.44 |
| Table-LLaVA | Vicuna-1.5 | 7B | 65.82 | 33.23 | 45.19 | 48.98 | 51.28 | 53.73 |
| Table-LLaVA | Vicuna-1.5 | 13B | 67.25 | 33.12 | 52.31 | 53.35 | 55.13 | 55.79 |
| LLaVA v1.5 | Vicuna-1.5 | 7B | 21.22 | 10.46 | 4.53 | 8.45 | 51.71 | 16.59 |
| SynTab-LLaVA | Vicuna-1.5 | 7B | 88.08 | 88.85 | 57.95 | 80.47 | 84.62 | 83.58 |
| InternVL-2.5 | Internlm2.5 | 8B | 91.31 | 93.79 | 62.48 | 92.42 | 100.00 | 88.17 |
| Ovis2 [41] | Qwen2.5 | 8B | 95.34 | 85.19 | 61.74 | 80.17 | 99.57 | 87.57 |
| HIPPO [42] | Qwen2 | 8B | 89.79 | 96.49 | 63.77 | 79.01 | 100.00 | 87.59 |
| MiniCPM-V-2.6 [43] | Qwen2 | 8B | 91.02 | 97.35 | 56.47 | 88.34 | 96.58 | 87.76 |
| Ours | ||||||||
| MMTR | Qwen2.5 | 3B | 96.37 | 97.77 | 80.28 | 92.13 | 98.72 | 94.32 |
| Alignment Layer Pretraining Data | Vision Tower Pretraining Data | Instruction Q-A | Instruction Reflection | Accuracy |
|---|---|---|---|---|
| × | × | ✓ | × | 90.67 |
| ✓ | × | ✓ | × | 90.88 |
| ✓ | ✓ | ✓ | × | 92.78 |
| ✓ | ✓ | ✓ | ✓ | 94.32 |
| Reflection Method | Accuracy | Gain over Strategy Only |
|---|---|---|
| Strategy Only | 79.20 | — |
| Prompting-Based Reflection | 82.71 | +3.51 |
| MMTR | 84.46 | +5.26 |
| Transition | Count | Percentage (%) |
|---|---|---|
| 7093 | 92.28 | |
| 38 | 0.50 | |
| 157 | 2.05 | |
| 398 | 5.17 | |
| Net Accuracy Improvement | +119 | +1.55 |
| Reflection Data | Accuracy | Gain over 0 |
|---|---|---|
| 0 | 92.78 | — |
| 10 | 93.63 | +0.85 |
| 25 | 94.19 | +1.41 |
| 100 | 94.32 | +1.54 |
| Training Stage | Inference Stage | Accuracy |
|---|---|---|
| Default | Default | 62.88 |
| Default | Adaptive Scaling | 87.90 |
| Adaptive Scaling | Default | 52.76 |
| Adaptive Scaling | Adaptive Scaling | 92.78 |
| Q-A Instruction | Reflection Instruction | Strategy Module Accuracy | Reflection Module Accuracy |
|---|---|---|---|
| Random | Random | 92.61 | 93.81 |
| Fixed | Random | 92.78 | 94.20 |
| Random | Fixed | 92.48 | 94.15 |
| Fixed | Fixed | 92.78 | 94.32 |
| Model | TQA | TFV | ||
|---|---|---|---|---|
| TabMCQ | TAT-QA | TabFact | PubHealthTab | |
| InternVL2.5-8B | 87.27 | 52.59 | 71.31 | 78.37 |
| Qwen2.5-VL | 85.62 | 61.79 | 68.61 | 68.64 |
| HIPPO | 85.13 | 60.75 | 60.75 | 76.16 |
| Ovis2 | 82.90 | 68.78 | 81.71 | 84.55 |
| MiniCPM-V-2.6 | 83.68 | 51.55 | 78.48 | 75.08 |
| LLaVA v1.5 | - | 2.97 | 18.90 | - |
| Monkey | 17.89 | 12.31 | 22.56 | 18.89 |
| Strategy | 73.28 | 17.62 | 46.60 | 56.49 |
| Strategy + Reflection | 75.61 | 19.30 | 49.41 | 58.39 |
| Type | Run 1 | Run 2 | Run 3 | Average |
|---|---|---|---|---|
| Strategy | 92.78 | 92.78 | 92.78 | 92.78 |
| Strategy + Reflection | 94.33 | 94.30 | 94.35 | 94.32 |
| Run | Accuracy Gain | p-Value | 95% Bootstrap CI |
|---|---|---|---|
| Run 1 | +1.55% | 3.481 × 10−18 | [+1.21%, +1.91%] |
| Run 2 | +1.52% | 5.742 × 10−18 | [+1.17%, +1.87%] |
| Run 3 | +1.57% | 8.454 × 10−19 | [+1.22%, +1.94%] |
| Table Type | Correct→Wrong | Wrong→Correct | Net Gain |
|---|---|---|---|
| Normal | 0.10% | 0.67% | +0.56% |
| Stem-and-Leaf | 3.14% | 10.04% | +6.90% |
| Price List | 0.00% | 0.58% | +0.58% |
| Supply-and-Demand | 0.00% | 0.00% | 0.00% |
| Schedule | 0.00% | 2.92% | +2.92% |
| Module | Avg. Latency (s) | Avg. Tokens |
|---|---|---|
| Strategy | 3.72 | 73.6 |
| Reflection (additional) | 2.89 | 56.5 |
| Total | 6.61 | 130.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bai, L.; Ming, Y.; Chen, Y. MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction. Information 2026, 17, 641. https://doi.org/10.3390/info17070641
Bai L, Ming Y, Chen Y. MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction. Information. 2026; 17(7):641. https://doi.org/10.3390/info17070641
Chicago/Turabian StyleBai, Lixin, Yibo Ming, and Yanmin Chen. 2026. "MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction" Information 17, no. 7: 641. https://doi.org/10.3390/info17070641
APA StyleBai, L., Ming, Y., & Chen, Y. (2026). MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction. Information, 17(7), 641. https://doi.org/10.3390/info17070641

