Enhancing the Extraction of GHG Emission-Reduction Targets from Sustainability Reports Using Vision Language Models
Abstract
1. Introduction
2. Related Research
3. Materials and Methods
4. Results and Research Process
4.1. Adjusted Dataset
4.2. Automatic Evaluation Procedure
4.3. Model Selection
4.4. Text Preparation Methods
4.5. Prompt Improvement
4.6. Evaluating the Final Prompt
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| CBN | Carbon |
| DPI | Dots per Inch |
| DSRM | Design Science Research Methodology |
| ESG | Environmental, Social, and Governance |
| GHG | Greenhouse Gas |
| IE | Information Extraction |
| IR | Information Retrieval |
| JSON | JavaScript Object Notation |
| LLM | Large Language Model |
| NLP | Natural Language Processing |
| OCR | Optical Character Recognition |
| RAG | Retrieval-Augmented Generation |
| VLM | Vision Language Model |
| XML | Extensible Markup Language |
Appendix A
| Attribute | No Value | Extractions | No. of Classes |
|---|---|---|---|
| Train dataset | |||
| CBN_TARGET_CATEGORY | 0 | 18 | 2 |
| CBN_TARGET_REDUC_PCT | 6 | 12 | 7 |
| TARGET_CARBON_PROGRESS_PCT | 16 | 2 | 2 |
| TARGET_CARBON_SCOPE_123_CATEGORY | 2 | 16 | 4 |
| NET_ZERO_CLAIM_TYPE | 14 | 4 | 2 |
| CBN_TARGET_BASE_YEAR | 7 | 11 | 6 |
| CBN_TARGET_BASE_YEAR_VAL | 14 | 4 | 5 |
| CBN_TARGET_YEAR | 0 | 18 | 7 |
| CBN_TARGET_YEAR_VAL | 14 | 4 | 5 |
| TARGET_CARBON_TYPE | 8 | 10 | 3 |
| TARGET_CARBON_UNITS | 8 | 10 | 10 |
| Validation dataset | |||
| CBN_TARGET_CATEGORY | 0 | 52 | 2 |
| CBN_TARGET_REDUC_PCT | 19 | 33 | 13 |
| TARGET_CARBON_PROGRESS_PCT | 49 | 3 | 4 |
| TARGET_CARBON_SCOPE_123_CATEGORY | 16 | 36 | 5 |
| NET_ZERO_CLAIM_TYPE | 34 | 18 | 2 |
| CBN_TARGET_BASE_YEAR | 20 | 32 | 9 |
| CBN_TARGET_BASE_YEAR_VAL | 46 | 6 | 7 |
| CBN_TARGET_YEAR | 0 | 52 | 10 |
| CBN_TARGET_YEAR_VAL | 48 | 4 | 5 |
| TARGET_CARBON_TYPE | 44 | 8 | 3 |
| TARGET_CARBON_UNITS | 44 | 8 | 8 |
| Test dataset | |||
| CBN_TARGET_CATEGORY | 0 | 212 | 2 |
| CBN_TARGET_REDUC_PCT | 69 | 143 | 39 |
| TARGET_CARBON_PROGRESS_PCT | 195 | 17 | 12 |
| TARGET_CARBON_SCOPE_123_CATEGORY | 77 | 135 | 6 |
| NET_ZERO_CLAIM_TYPE | 159 | 53 | 2 |
| CBN_TARGET_BASE_YEAR | 91 | 121 | 17 |
| CBN_TARGET_BASE_YEAR_VAL | 191 | 21 | 20 |
| CBN_TARGET_YEAR | 4 | 208 | 15 |
| CBN_TARGET_YEAR_VAL | 183 | 29 | 27 |
| TARGET_CARBON_TYPE | 152 | 60 | 5 |
| TARGET_CARBON_UNITS | 152 | 60 | 45 |
Appendix B
| Model | Context Window | Input Costs | Output Costs | Model Size | Company |
|---|---|---|---|---|---|
| Gemini 2.0 Flash | 1,048,576 | 0.1 €/million token | 0.4 €/million token | Unknown | Google, Mountain View, CA, USA |
| Mistral Small 3.2 | 128,000 | 0.075 €/million token | 0.2 €/million token | 24 B | Mistral AI, Paris, France |
| GPT-4.1 Mini | 1,047,576 | 0.4 €/million token | 1.6 €/million token | Unknown | OpenAI, San Francisco, CA, USA |
| Llama4 Scout | 10,000,000 | 0.08 €/million token | 0.3 €/million token | 109 B | Meta, Menlo Park, CA, USA |
Appendix C
| Text | Purpose |
|---|---|
| Your task is to extract absolute and interim GHG-Emission targets from an image and/or a <context> extracted from an ESG report; both contain the same information. Your answer should contain just the relevant information. | Instruction |
| <json_schema_descriptions> **Never guess or infer**. If an attribute is not in the context, use the default (-1 for numbers, “NOTHING_TO_EXTRACT” for strings). CBN_TARGET_DESC: “Target description text (1-2 sentences) from where the information was found. Use NOTHING_TO_EXTRACT if not found.” CBN_TARGET_CATEGORY: “Carbon/Energy Target Category of the GHG-Metric. Choose one of these: Carbon Emissions - Absolute, Carbon Emissions - Intensity, Energy Consumption - Absolute, Energy Consumption - Intensity.” CBN_TARGET_REDUC_PCT: “Overall percentage reduction (%) the company aims to achieve between the CO2e equivalent reduction base year (CBN_TARGET_BASE_YEAR) and the CO2e reduction target year (CBN_TARGET_YEAR). Decimal (%). Use -1 if not found.” TARGET_CARBON_PROGRESS_PCT: “Progress already achieved of towards the CBN_TARGET_REDUC_PCT. Decimal (%). Use -1 if not found.” TARGET_CARBON_SCOPE_123_CATEGORY: “Greenhouse gas scope(s) covered by the target. Scope 1, 2, and 3 can be combined. This is easy you just need to look for the relevant entities inside the text. Other categories (e.g., Energy use) must be a single element.” NET_ZERO_CLAIM_TYPE: “Claim that the Company aims to be “Net Zero” at some point. There also must be a CBN_TARGET_YEAR to be extracted.” TARGET_CARBON_TYPE: “Indicates whether the GHG-Reduction Target is Absolute or Intensity-based. Choose one of: Absolute, Sales intensity, Production intensity, Other. Use NOTHING_TO_EXTRACT if not found.” TARGET_CARBON_UNITS: “Unit of measure used for the GHG-metric (e.g., tCO2e per USD million revenue, MJ, % of operations, etc.). Use NOTHING_TO_EXTRACT if not found.” CBN_TARGET_BASE_YEAR: “Base year of the GHG-Metric. Integer. Use -1 if not disclosed.” CBN_TARGET_BASE_YEAR_VAL: “Base value associated with the CBN_TARGET_BASE_YEAR. Can be a percentage (%) or a unit-based value. Decimal. Use -1 if not found.” CBN_TARGET_YEAR: “Year by which the GHG-metric is intended to be met. Integer. Use -1 if not found.” CBN_TARGET_YEAR_VAL: “Targeted reduction target of the CBN_TARGET_YEAR. Decimal. Use -1 if not found.” </json_schema_descriptions> | Attribute descriptions |
| <json_schema> { “$schema”: “http://json-schema.org/draft-07/schema#”, “title”: “ESG Targets Extraction Schema”, “type”: “object”, “properties”: { “CBN_TARGET_DESC”: { “type”: “string” }, “CBN_TARGET_CATEGORY”: { “type”: “string”, “enum”: [ “Carbon Emissions - Absolute”, “Carbon Emissions - Intensity”, “Energy Consumption - Absolute”, “Energy Consumption - Intensity” ] }, “CBN_TARGET_REDUC_PCT”: { “type”: “number”, “default”: -1 }, “TARGET_CARBON_PROGRESS_PCT”: { “type”: “number”, “default”: -1 }, “TARGET_CARBON_SCOPE_123_CATEGORY”: { “type”: “array”, “minItems”: 1, “uniqueItems”: true, “items”: { “type”: “string”, “enum”: [“Scope 1”, “Scope 2”, “Scope 3”, “Energy use”, “NOTHING_TO_EXTRACT”] } }, “NET_ZERO_CLAIM_TYPE”: { “type”: “string”, “enum”: [“Net Zero”, “Scientific Net Zero”, “SBTI Net Zero”, “NOTHING_TO_EXTRACT”] }, “TARGET_CARBON_TYPE”: { “type”: “string”, “enum”: [“Absolute”, “Sales intensity”, “Production intensity”, “Other”, “NOTHING_TO_EXTRACT”] }, “TARGET_CARBON_UNITS”: { “type”: “string”, “default”: “NOTHING_TO_EXTRACT” }, “CBN_TARGET_BASE_YEAR_VAL”: { “type”: “number”, “default”: -1 }, “CBN_TARGET_BASE_YEAR”: { “type”: “integer”, “default”: -1 }, “CBN_TARGET_YEAR_VAL”: { “type”: “number”, “default”: -1 }, “CBN_TARGET_YEAR”: { “type”: “integer”, “default”: -1 } }, “required”: [ “CBN_TARGET_DESC”, “CBN_TARGET_CATEGORY”, “CBN_TARGET_REDUC_PCT”, “TARGET_CARBON_PROGRESS_PCT”, “TARGET_CARBON_SCOPE_123_CATEGORY”, “NET_ZERO_CLAIM_TYPE”, “CBN_TARGET_BASE_YEAR”,”CBN_TARGET_BASE_YEAR_VAL”, “CBN_TARGET_YEAR”, “CBN_TARGET_YEAR_VAL”, “TARGET_CARBON_TYPE”, “TARGET_CARBON_UNITS” ], “additionalProperties”: false } </json_schema> This is how your answer should look like if you dont see a GHG-Emission-Target (ALWAYS INCLUDE THE XML TAGS): <extraction> [] </extraction> This is how your answer should look like if you see one or more GHG-Emission-Targets (ALWAYS INCLUDE THE XML TAGS): <extraction> [ {{ ‘CBN_TARGET_CATEGORY’: ‘...’, ... }}, ... ] </extraction> <context> … </context> Now extract the GHG-Metrics from the given context: | Response Template |
| Text | Purpose |
|---|---|
| You are an expert in Environmental, Social, and Governance (ESG) extraction, specializing in Greenhouse Gas Reduction Target (GHG) metrics. You are the top performer in your department, tasked with extracting **company-wide** absolute and interim GHG and CO2e (equivalent) reduction targets from ESG reports. These targets represent aggregated, summarized data for the entire company, not individual emissions sources. You will be provided with a context (text, image, or both containing identical information, contained in text, tables and figures). As the company’s most experienced and motivated expert, your objective is to meticulously read the context for relevant GHG reduction metrics. Accuracy and comprehensiveness of your extraction are crucial Exceptionally good work will be rewarded with 1000$ bonuses. | Instruction and Persona |
| <language_info> The prompt uses the following terms: - “Metric/ESG Metric/ESG-GHG-Metric” refers to a greenhouse gas reduction target extracted from the context and represented as a JSON object with key-value pairs (attributes). Each metric is defined by a unique combination of enabled (“on”) and disabled (“off”) attributes. </language_info> | Meta-Language |
| You follow these guidelines: <guidelines> - Focus on company-wide GHG-reduction emissions reduction targets: The target must represent the entire company’s CO2e emissions. - Populate all fields in the JSON schema: Use default values if attribute not present for given metric (e.g., -1 for numerical values, “NOTHING_TO_EXTRACT” for strings). - Extract the esg-metrics precisely. - Adhere strictly to the provided JSON schema: Format your answers correctly. - Always include every key-value pair defined in the JSON schema in your JSON objects. - If a target is presented as a range, select the *higher* value. - Exclude Single Emitter Targets: If a target describes only a singular greenhouse gas emitter (e.g., methane), do not include it. - *Absolutely no inference of scope*: The scope *must* be directly and explicitly found in the OCR text. Do not use contextual understanding or external knowledge to determine the scope. If absent, use the default. - *Numbers must be directly linked to the target*: Extract numbers ONLY if they are part of the target statement (e.g., “Reduce emissions by 20%”). Do not extract numbers from unrelated tables, figures, or text, even if they seem relevant. </guidelines> | Best Practices |
| <extraction_ruleset> - TARGET_CARBON_TYPE and TARGET_CARBON_UNITS must either *both* be NOTHING_TO_EXTRACT or *both* be extracted from the context. If one is present, the other must also be explicitly stated. - If a TARGET_CARBON_UNITS is found in the context, TARGET_CARBON_TYPE must also be present. - For absolute targets, only populate TARGET_CARBON_TYPE and TARGET_CARBON_UNITS if *both* are explicitly stated in the context. - A NET_ZERO_CLAIM_TYPE must have a corresponding CBN_TARGET_YEAR to be extracted. Do not extract “Net Zero” claims without a target year. - CBN_TARGET_BASE_YEAR_VAL and CBN_TARGET_YEAR_VAL represent the value of TARGET_CARBON_UNITS. For example, a figure showing 100% -> 80% GHG reduction. - TARGET_CARBON_PROGRESS_PCT describes the progress in percentage already made towards achieving the desired target. </extraction_ruleset> | Extraction descriptions |
| Text | Purpose |
|---|---|
| <examples> <example> … </example> <extraction> … </extraction> <example> … </example> <extraction> … </extraction> <example> … </example> <extraction> … </extraction> <example> … </example> <extraction> … </extraction> <example> … </example> <extraction> … </extraction> </examples> | Few-Shot examples |
Appendix D
| You are an expert in Environmental, Social, and Governance (ESG) extraction, specializing in [METRIC_NAME] metrics. You are the top performer in your department, tasked with extracting [METRIC_NAME] from ESG reports. You will be provided with a context (text, image, or both containing identical information, contained in text, tables and figures). As the company’s most experienced and motivated expert, your objective is to meticulously read the context for relevant [METRIC_NAME]. Accuracy and comprehensiveness of your extraction are crucial Exceptionally good work will be rewarded with 1000$ bonuses. <language_info> The prompt uses the following terms: “Metric/ESG Metric “ refers to a [METRIC_NAME] extracted from the context and represented as a JSON object with key-value pairs (attributes). Each metric is defined by a unique combination of enabled (“on”) and disabled (“off”) attributes. </language_info> You follow these guidelines: <guidelines> - Populate all fields in the JSON schema: Use default values if attribute not present for given metric (e.g., -1 for numerical values, “NOTHING_TO_EXTRACT” for strings). - Extract the esg-metrics precisely. - Adhere strictly to the provided JSON schema: Format your answers correctly. - Always include every key-value pair defined in the JSON schema in your JSON objects. - If a target is presented as a range, select the *higher* value. [FURTHER_GUIDELINES] </guidelines> <json_schema_descriptions> **Never guess or infer**. If an attribute is not in the context, use the default (-1 for numbers, “NOTHING_TO_EXTRACT” for strings). TEXTUAL_METRIC_ATTRIBUTE: “[SHORT_DESCRIPTION]. [SHOW_ONE_OR_MORE_OF_POSSIBLE_CLASSES]. Use NOTHING_TO_EXTRACT if not found.” NUMERICAL_METRIC_ ATTRIBUTE: “[SHORT_DESCRIPTION]. [UNIT_TYPE]. Use -1 if not found.” … </json_schema_descriptions> <json_schema> [INDIVIDUAL_JSON_SCHEMA_FOR_TASK] </json_schema> <extraction_ruleset> [LIST_OF_SPECIAL_RULESETS_FOR_COMPLEX_EXTRACTIONS_AND_RELATIONSHIPS] </extraction_ruleset> This is how your answer should look like if you dont see a [METRIC_NAME] (ALWAYS INCLUDE THE XML TAGS): <extraction> [] </extraction> This is how your answer should look like if you see one or more [METRIC_NAME] (ALWAYS INCLUDE THE XML TAGS): <extraction> [ {{ … }}, ... ] </extraction> <examples> [TEXT TOKEN, IMAGE TOKEN OR BOTH. EXTRACTION PART ARE ALWAYS TEXT TOKENS] <example> … </example> <extraction> … </extraction> </examples> <context> … </context> Now extract the Metrics from the given context: |
References
- Cruz, C.A.; Matos, F. ESG Maturity: A Software Framework for the Challenges of ESG Data in Investment. Sustainability 2023, 15, 2610. [Google Scholar] [CrossRef]
- Gutierrez-Bustamante, M.; Espinosa-Leal, L. Natural Language Processing Methods for Scoring Sustainability Reports—A Study of Nordic Listed Companies. Sustainability 2022, 14, 9165. [Google Scholar] [CrossRef]
- Berg, F.; Kölbel, J.F.; Rigobon, R. Aggregate Confusion: The Divergence of ESG Ratings. Rev. Financ. 2022, 26, 1315–1344. [Google Scholar] [CrossRef]
- Zou, Y.; Shi, M.; Chen, Z.; Deng, Z.; Lei, Z.; Zeng, Z.; Yang, S.; Tong, H.; Xiao, L.; Zhou, W. ESGReveal: An LLM-Based Approach for Extracting Structured Data from ESG Reports. J. Clean. Prod. 2025, 489, 144572. [Google Scholar] [CrossRef]
- Escrig-Olmedo, E.; Fernández-Izquierdo, M.; Ferrero-Ferrero, I.; Rivera-Lirio, J.; Muñoz-Torres, M. Rating the Raters: Evaluating How ESG Rating Agencies Integrate Sustainability Principles. Sustainability 2019, 11, 915. [Google Scholar] [CrossRef]
- MSCI. MSCI ESG Ratings; MSCI: New York, NY, USA, 2025. [Google Scholar]
- Ni, J.; Bingler, J.; Colesanti Senni, C.; Kraus, M.; Gostlow, G.; Schimanski, T.; Stammbach, D.; Vaghefi, S.; Wang, Q.; Webersinke, N.; et al. chatReport: Democratizing Sustainability Disclosure Analysis through LLM-based Tools. arXiv 2023, arXiv:2307.15770. [Google Scholar] [CrossRef]
- Laokulrach, M. ESG Data and Metrics. In Environmental, Social, and Governance (ESG) Investment and Reporting; Bednárová, M., Soratana, K., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2025; pp. 119–144. [Google Scholar]
- Sneideriene, A.; Legenzova, R. Greenwashing Prevention in Environmental, Social, and Governance (ESG) Disclosures: A Bibliometric Analysis. Res. Int. Bus. Financ. 2025, 74, 102720. [Google Scholar] [CrossRef]
- Du, S.; El Akremi, A.; Jia, M. Quantitative Research on Corporate Social Responsibility: A Quest for Relevance and Rigor in a Quickly Evolving, Turbulent World. J. Bus. Ethics 2022, 187, 1–15. [Google Scholar] [CrossRef]
- Christensen, H.B.; Hail, L.; Leuz, C. Mandatory CSR and Sustainability Reporting: Economic Analysis and Literature Review. Rev. Acc. Stud. 2021, 26, 1176–1248. [Google Scholar] [CrossRef]
- Beck, J.; Steinberg, A.; Dimmelmeier, A.; Domenech Burin, L.; Kormanyos, E.; Fehr, M.; Schierholz, M. Addressing Data Gaps in Sustainability Reporting: A Benchmark Dataset for Greenhouse Gas Emission Extraction. Sci. Data 2025, 12, 1497. [Google Scholar] [CrossRef]
- Visalli, F.; Patrizio, A.; Lanza, A.; Papaleo, P.; Nautiyal, A.; Pupo, M.; Scilinguo, U.; Oro, E.; Ruffolo, M. ESG Data Collection with Adaptive AI. In Proceedings of the 25th International Conference on Enterprise Information Systems; Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S., Eds.; SCITEPRESS—Science and Technology Publications: Setúbal, Portugal, 2023; pp. 468–475. [Google Scholar]
- Hoffswell, J.; Liu, Z. Interactive Repair of Tables Extracted from PDF Documents on Mobile Devices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1–13. [Google Scholar]
- Bronzini, M.; Nicolini, C.; Lepri, B.; Passerini, A.; Staiano, J. Glitter or Gold? Deriving Structured Insights from Sustainability Reports via Large Language Models. EPJ Data Sci. 2024, 13, 41. [Google Scholar] [CrossRef]
- Bordes, F.; Pang, R.Y.; Ajay, A.; Li, A.C.; Bardes, A.; Petryk, S.; Mañas, O.; Lin, Z.; Mahmoud, A.; Jayaraman, B.; et al. An Introduction to Vision-Language Modeling. arXiv 2024, arXiv:2405.17247. [Google Scholar] [CrossRef]
- Peng, J.; Gao, J.; Tong, X.; Guo, J.; Yang, H.; Qi, J.; Li, R.; Li, N.; Xu, M. Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis. arXiv 2024, arXiv:2401.02992. [Google Scholar] [CrossRef]
- Kang, J.; Kchouk, M.; Bellato, S.; Gan, M.; El Maarouf, I. FinSim4-ESG Shared Task: Learning Semantic Similarities for the Financial Domain. Extended Edition to ESG Insights. In Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP@IJCAI-ECAI 2022); Chen, C.-C., Huang, H.-H., Takamura, H., Chen, H.-H., Eds.; Association for Computational Linguistics: San Diego, CA, USA, 2022; pp. 57–63. [Google Scholar]
- Polignano, M.; Bellantuono, N.; Lagrasta, F.P.; Caputo, S.; Pontrandolfo, P.; Semeraro, G. An NLP Approach for the Analysis of Global Reporting Initiative Indexes from Corporate Sustainability Reports. In Proceedings of the LREC 2022 Workshop on The First Computing Social Responsibility Workshop: NLP Approaches to Corporate Social Responsibilities (CSR-NLP I 2022); Wan, M., Huang, C.-R., Eds.; European Language Resources Association (ELRA): Paris, France, 2022; pp. 1–8. [Google Scholar]
- Wrzalik, M.; Faust, F.; Sieber, S.; Ulges, A. NetZeroFacts: Two-Stage Emission Information Extraction from Company Reports. In Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing; Chen, C.-C., Liu, X., Hahn, U., Nourbakhsh, A., Ma, Z., Smiley, C., Hoste, V., Das, S.R., Li, M., Ghassemi, M., et al., Eds.; Association for Computational Linguistics: Torino, Italia, 2024; pp. 70–84. [Google Scholar]
- Schimanski, T.; Bingler, J.; Hyslop, C.; Kraus, M.; Leippold, M. ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets. arXiv 2023, arXiv:2310.08096. [Google Scholar]
- Dave, A.; Zhu, M.; Hu, D.; Tiwari, S. Climate AI for Corporate Decarbonization Metrics Extraction. arXiv 2024, arXiv:2411.03402. [Google Scholar] [CrossRef]
- Adhikari, N.S.; Agarwal, S. A Comparative Study of PDF Parsing Tools Across Diverse Document Categories. arXiv 2024, arXiv:2410.09871. [Google Scholar] [CrossRef]
- Huang, Y.; Lv, T.; Cui, L.; Lu, Y.; Wei, F. LayoutLMv3: Pre-Training for Document AI with Unified Text and Image Masking. In Proceedings of the 30th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2022; pp. 4083–4091. [Google Scholar]
- Do, A.-D.; Do, T.-H. Adapting Vision-Language Models for Information Extraction from Bilingual Medical Invoices. In Proceedings of the 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC); IEEE: New York, NY, USA, 2025; p. 2429. [Google Scholar]
- Khalighinejad, G.; Scott, S.; Liu, O.; Anderson, K.L.; Stureborg, R.; Tyagi, A.; Dhingra, B. MatViX: Multimodal Information Extraction from Visually Rich Articles. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Chiruzzo, L., Ritter, A., Wang, L., Eds.; Association for Computational Linguistics: Albuquerque, NM, USA, 2025; pp. 3636–3655. [Google Scholar]
- Gupta, T.; Goel, T.; Verma, I. Exploring Multimodal Language Models for Sustainability Disclosure Extraction: A Comparative Study. In Proceedings of the Sixth Workshop on Insights from Negative Results in NLP; Drozd, A., Sedoc, J., Tafreshi, S., Akula, A., Shu, R., Eds.; Association for Computational Linguistics: Albuquerque, NM, USA, 2025; pp. 141–149. [Google Scholar]
- Ke, W.; Zheng, Y.; Li, Y.; Xu, H.; Nie, D.; Wang, P.; He, Y. Large Language Models in Document Intelligence: A Comprehensive Survey, Recent Advances, Challenges, and Future Trends. ACM Trans. Inf. Syst. 2025, 44, 1–64. [Google Scholar] [CrossRef]
- Yu, S.; Tang, C.; Xu, B.; Cui, J.; Ran, J.; Yan, Y.; Liu, Z.; Wang, S.; Han, X.; Liu, Z.; et al. VisRAG: Vision-Based Retrieval-Augmented Generation on Multi-Modality Documents. arXiv 2025, arXiv:2410.10594. [Google Scholar]
- Peffers, K.; Tuunanen, T.; Rothenberger, M.A.; Chatterjee, S. A Design Science Research Methodology for Information Systems Research. J. Manag. Inf. Syst. 2007, 24, 45–77. [Google Scholar] [CrossRef]
- Aronczyk, M.; McCurdy, P.; Russill, C. Greenwashing, Net-Zero, and the Oil Sands in Canada: The Case of Pathways Alliance. Energy Res. Soc. Sci. 2024, 112, 103502. [Google Scholar] [CrossRef]
- Song, Y.; Wang, T.; Cai, P.; Mondal, S.K.; Sahoo, J.P. A Comprehensive Survey of Few-Shot Learning: Evolution, Applications, Challenges, and Opportunities. ACM Comput. Surv. 2023, 55, 1–40. [Google Scholar] [CrossRef]
- Kumar, A. Micro-Average, Macro-Average, Weighting: Precision, Recall, F1-Score. Anal. Yogi 2023. Available online: https://vitalflux.com/micro-average-macro-average-scoring-metrics-multi-class-classification-python/ (accessed on 14 December 2025).
- Leung, K. Micro, Macro & Weighted Averages of F1 Score, Clearly Explained. Towards Data Sci. 2022. Available online: https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f/ (accessed on 14 December 2025).
- Foster, I.; Ghani, R.; Jarmin, R.; Kreuter, F.; Lane, J. Big Data and Social Science, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar]
- Kuhn, H.W. The Hungarian Method for the Assignment Problem. In 50 Years of Integer Programming 1958-2008: From the Early Years to the State-of-the-Art; Jünger, M., Liebling, T.M., Naddef, D., Nemhauser, G.L., Pulleyblank, W.R., Reinelt, G., Rinaldi, G., Wolsey, L.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 29–47. [Google Scholar]
- Black, P.E. Greedy Algorithm. Dictionary of Algorithms and Data Structures 2005. Available online: https://xlinux.nist.gov/dads/HTML/greedyalgo.html (accessed on 8 January 2026).
- Poznanski, J.; Rangapur, A.; Borchardt, J.; Dunkelberger, J.; Huff, R.; Lin, D.; Rangapur, A.; Wilhelm, C.; Lo, K.; Soldaini, L. olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models. arXiv 2025, arXiv:2502.18443. [Google Scholar] [CrossRef]
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. In Proceedings of the 30th Conference on Pattern Languages of Programs; The Hillside Group: Monticello, IL, USA, 2023; pp. 1–31. [Google Scholar]
- Yuan, J.; Li, H.; Ding, X.; Xie, W.; Li, Y.-J.; Zhao, W.; Wan, K.; Shi, J.; Hu, X.; Liu, Z. Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference. In Proceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
- Chen, L.-C.; Weng, H.-T.; Pardeshi, M.S.; Chen, C.-M.; Sheu, R.-K.; Pai, K.-C. Evaluation of Prompt Engineering on the Performance of a Large Language Model in Document Information Extraction. Electronics 2025, 14, 2145. [Google Scholar] [CrossRef]
- Cui, W.; Zhang, J.; Li, Z.; Sun, H.; Lopez, D.; Das, K.; Malin, B.A.; Kumar, S. Heuristic-Based Search Algorithm in Automatic Instruction-Focused Prompt Optimization: A Survey. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025; Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 22093–22111. [Google Scholar]
- Birti, M.; Osborne, F.; Maurino, A. Optimizing Large Language Models for ESG Activity Detection in Financial Texts. In Proceedings of the ICAIF ‘25: Proceedings of the 6th ACM International Conference on AI in Finance, Singapore, 15–18 November 2025; Association for Computing Machinery: New York, NY, USA, 2025. [Google Scholar]
- John, L.; Ghanmi, A.M.; Wittenborg, T.; Auer, S.; Karras, O. ExtracTable: Human-in-the-Loop Transformation of Scientific Corpora into Structured Knowledge. In Proceedings of the Linking Theory and Practice of Digital Libraries; Balke, W.-T., Golub, K., Manolopoulos, Y., Stefanidis, K., Zhang, Z., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2026; pp. 470–487. [Google Scholar]







| Attribute Name | Data Type | Classes or Value Ranges |
|---|---|---|
| CBN_TARGET_CATEGORY | categorical | (Carbon emissions/Energy Consumption)-(Absolute/Intensity) |
| CBN_TARGET_REDUC_PCT | decimal | 0–100% |
| TARGET_CARBON_PROGRESS_PCT | decimal | 0–100% |
| TARGET_CARBON_PROGRESS_VALUE | decimal | Any number |
| TARGET_CARBON_SCOPE_123_CATEGORY | categorical | (Scope 1/Scope 2/Scope 3/Energy use) |
| TARGET_CARBON_TYPE | categorical | (Absolute/Sales intensity/Production intensity) |
| TARGET_CARBON_UNITS | textual | e.g., “tCO2e per USD million revenue” |
| CBN_TARGET_BASE_YEAR | numerical | 2000–2023 |
| CBN_TARGET_BASE_YEAR_VAL | decimal | Any number |
| CBN_TARGET_YEAR | numerical | 2023–2100 |
| CBN_TARGET_YEAR_VAL | decimal | Any number |
| CBN_TARGET_DESC | textual | e.g., “Reduce Scope 1 and 2 GHG emissions intensity by 15% to 20% by 2030 against a 2019 baseline” |
| NET_ZERO_CLAIM_TYPE | categorical | (Net Zero, Scientific Net Zero, SBTI Net Zero) |
| Dataset | Number of Pages | Relevant Pages | Irrelevant Pages | ESG Metrics |
|---|---|---|---|---|
| Training | 5 | 4 | 1 | 10 |
| Validation | 109 | 28 | 81 | 52 |
| Testing | 434 | 111 | 323 | 212 |
| Simple Layout in Test | - | 58 | - | 110 |
| Complex Layout in Test | - | 53 | - | 102 |
| Evaluation Metric | Mistral Small 3.2 | Gemini 2.0 Flash | Llama 4 Scout | GPT-4.1 Mini |
|---|---|---|---|---|
| Format adherence | √ | ⨉ | √ | √ |
| Missing metrics | 0.1538 | 0.0577 | 0.1538 | 0.0769 |
| Noise ratio | 0.3654 | 0.8654 | 0.7692 | 0.5962 |
| Extraction quality—page level | 0.9553 | 0.8686 | 0.8022 | 0.8783 |
| Extraction quality—attribute level | 0.7457 | 0.7464 | 0.7027 | 0.7436 |
| Evaluation Metric | PyMuPDF | PyMuPDF4LLM | olmOCR | Gemini 2.0 Flash |
|---|---|---|---|---|
| Missing metrics | 0.1538 | 0.3077 | 0.3077 | 0.2308 |
| Noise ratio | 0.2903 | 0.2692 | 0.2500 | 0.3077 |
| Extraction quality—page level | 0.9723 | 0.9235 | 0.9235 | 0.9633 |
| Extraction quality—attribute level | 0.7512 | 0.6537 | 0.6605 | 0.7031 |
| Evaluation Metric | Text | Image | Text + Image |
|---|---|---|---|
| Missing metrics | 0.1346 | 0.1346 | 0.1538 |
| Noise ratio | 0.1154 | 0.3077 | 0.2115 |
| Extraction quality—page level | 0.9908 | 0.9553 | 0.9549 |
| Extraction quality—attribute level | 0.8115 | 0.8127 | 0.8029 |
| Evaluation Metric | Text | Image | Text + Image |
|---|---|---|---|
| Missing metrics | 0.1154 | 0.1346 | 0.1154 |
| Noise ratio | 0.1923 | 0.1346 | 0.2308 |
| Extraction quality—page level | 0.9814 | 0.9819 | 0.9637 |
| Extraction quality—attribute level | 0.8450 | 0.8281 | 0.8587 |
| Evaluation Metric | Text | Image | Text + Image |
|---|---|---|---|
| Missing metrics | 0.2123 | 0.1651 | 0.1352 |
| Noise ratio | 0.2233 | 0.2893 | 0.3836 |
| Extraction quality—page level (w) | 0.9531 | 0.9621 | 0.9627 |
| Extraction quality—page level (m) | 0.9531 | 0.9378 | 0.9515 |
| Extraction quality—attribute level (w) | 0.7748 | 0.7983 | 0.8208 |
| Extraction quality—attribute level (m) | 0.4668 | 0.5123 | 0.5294 |
| Evaluation Metric | Simple Layout | Complex Layout |
|---|---|---|
| Missing metrics | 0.1212 | 0.1504 |
| Noise ratio | 0.4697 | 0.5392 |
| Extraction quality—page level (w) | 0.9669 | 0.9621 |
| Extraction quality—page level (m) | 0.9369 | 0.9242 |
| Extraction quality—attribute level (w) | 0.8473 | 0.7888 |
| Extraction quality—attribute level (m) | 0.5037 | 0.5546 |
| Precision—attribute level (w) | 0.9263 | 0.8895 |
| Recall—attribute level (w) | 0.7961 | 0.7314 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wilhelmi, L.; Bruns, C.; Schumann, M. Enhancing the Extraction of GHG Emission-Reduction Targets from Sustainability Reports Using Vision Language Models. Mach. Learn. Knowl. Extr. 2026, 8, 37. https://doi.org/10.3390/make8020037
Wilhelmi L, Bruns C, Schumann M. Enhancing the Extraction of GHG Emission-Reduction Targets from Sustainability Reports Using Vision Language Models. Machine Learning and Knowledge Extraction. 2026; 8(2):37. https://doi.org/10.3390/make8020037
Chicago/Turabian StyleWilhelmi, Lars, Christian Bruns, and Matthias Schumann. 2026. "Enhancing the Extraction of GHG Emission-Reduction Targets from Sustainability Reports Using Vision Language Models" Machine Learning and Knowledge Extraction 8, no. 2: 37. https://doi.org/10.3390/make8020037
APA StyleWilhelmi, L., Bruns, C., & Schumann, M. (2026). Enhancing the Extraction of GHG Emission-Reduction Targets from Sustainability Reports Using Vision Language Models. Machine Learning and Knowledge Extraction, 8(2), 37. https://doi.org/10.3390/make8020037

