Beyond Fuzzy Matching: A Dual-Augmentation RAG System for Robust Product Reconciliation in Accounting
Abstract
1. Introduction
- Catalog Augmentation: We first proactively enrich the internal corporate catalog. An LLM generates additional keywords, synonyms, and potential invoice-variants for each product, and these enhanced entries are stored as embeddings in a vector database.
- Query Augmentation and Reranking: During live invoice processing, our system uses an LLM to generate multiple augmented query variants from the raw, extracted invoice line. This “query expansion” retrieves a broad set of potential candidates, which are then evaluated by a specialized LLM-based reranker to produce the final Top-3 matches.
2. Related Work
2.1. E-Invoicing Adoption and RPA Governance
2.2. Automated Invoice Processing and Information Extraction (IE)
2.3. Entity Resolution (ER) and Product Matching for AP
2.4. Applying LLMs for Contextual Matching and Judgment
- Proactive Catalog Enrichment: First, we use an LLM to “read” our internal product catalog and proactively generate realistic synonyms, common abbreviations, and alternative descriptions for each item. This is an automated form of master data enhancement. For example, “M6 Stainless Steel Hex Bolt, 10mm, 100-pack” might be enriched with terms like “SS M6 bolt,” “hex 10mm,” or “box of 100 bolts.” This enriched data is stored in a high-speed vector database, allowing our system to anticipate the messy, inconsistent language suppliers use before their invoices even arrive. This concept builds on RAG research into document-side augmentation (James et al., 2025; Raina & Gales, 2024) but applies it as a practical control for data quality in a procurement context.
- Interpreting Noisy Invoice Queries: When a noisy line item like “SS hexblt 10mm” is extracted from an invoice, it often fails to match the catalog directly. Our system uses an LLM to rewrite this ambiguous query into multiple, clearer variants (e.g., “stainless steel hex bolt 10mm,” “M10 hex bolt stainless”). This step mimics an AP clerk’s “best guess” at what the supplier meant to say, translating vendor-specific shorthand into our internal terminology. This query-expansion step builds on prior work in LLM-driven query rewriting (Ma et al., 2023) and hypothetical document expansion (Gao et al., 2023), and is critical for handling OCR errors and vendor-specific phrasing, ensuring good candidates are found even when the initial data is poor.
- Applying Contextual Judgment (Reranking): The first two steps retrieve a list of potential matches from the catalog. This list is then passed to a final LLM-based reranker, which acts like a senior AP professional performing a final check. It compares the original invoice line to the top candidates and re-orders them based on contextual cues. This stage is crucial for resolving ambiguities that lexical methods miss, such as a “box” versus an “each” unit of measure or packaging equivalences (e.g., “10-pack” vs. “10 units”). This LLM-based reranking (Adeyemi et al., 2023) applies nuanced business logic, improving the quality of the final Top-3 matches presented to the user.
3. Materials and Methods
3.1. Research Design
3.2. System Architecture and Implementation
3.2.1. Phase 1: Catalog Augmentation and Vector Indexing
- Dense Retrieval: Each enriched catalog entry is encoded with OpenAI’s text-embedding-3-large model, with the output dimensionality reduced from the default 3072 to 1024 for storage and latency efficiency. Retrieval is performed by cosine similarity in this 1024-dimensional space. This captures semantic context (e.g., understanding that “portable PC” and “laptop” are related).
- Sparse Retrieval (BM25): In parallel, each catalog entry is indexed with a BM25 representation, produced by FastEmbed’s Qdrant/bm25 model, which preserves exact-keyword signal (e.g., SKU codes such as “X1-Carbon” or “M6 × 10”) that embeddings tend to smooth over.
3.2.2. Phase 2: Real-Time Query Augmentation and Reranking
3.3. Data Preparation and Experimental Setup
- The Corporate Catalog (Master Data): The “right-side” dataset serves as the authorized product master file found in an ERP system.
- The Invoice Stream (Query Set): The “left-side” dataset was filtered to create a stream of incoming “invoice line items” that require reconciliation against the catalog.
3.4. Evaluation Metrics
- Top-1 Recall: Proxy for “Touchless Automation”. This measures the percentage of invoices where the system’s first choice is correct, allowing for automatic posting without human review.
- Top-3 Recall: Proxy for “Decision Support Efficiency.” This measures how often the correct match appears in the top three suggestions. If the correct code is visible immediately, the AP clerk can validate it with a single click (taking seconds) rather than searching the catalog manually (taking minutes).
3.5. Methodological Choices
4. Results
4.1. Accuracy and Robustness
4.2. Analysis of Retrieval Baselines
4.3. Performance of the Proposed System
- Correction of Retrieval Errors: Even where the retrieval layer struggled (e.g., the drop in Walmart-Amazon enrichment), the proposed system recovered significant ground, achieving an R@1 of 83.47% (standard) and 84.10% (duplicate-aware).
- Top-1 accuracy: The system demonstrates its capability for automation, particularly in Abt-Buy, where it achieved 93.97% R@1 (standard) and 94.94% (duplicate-aware), outperforming the best baseline (Dense at 86.67%)
- Top-3 accuracy: At Top-3, the picture is uniformly positive under duplicate-aware scoring: the proposed system improves over the best baseline by +1.71 points on Amazon-Google (94.34% vs. 92.63%), +2.92 points on Abt-Buy (98.93% vs. 96.01% Dense Raw), and +1.35 points on Walmart-Amazon (97.92% vs. 96.57% Hybrid Raw). Under standard scoring, the LLM reranker improves performance on Abt-Buy (+1.95 points) and Walmart-Amazon (+0.73 points), and marginally underperforms on Amazon-Google (91.60% vs. 92.63% Dense Enriched, a 1.03-point gap that is within the bootstrap confidence interval). This single-benchmark gap aligns with the labeling-artifact analysis in Section 5.4: on Amazon-Google, which contains the densest cluster of duplicate text rows (103 duplicate clusters spanning 300 catalog rows), a strict literal scorer penalizes the reranker when it selects a text-identical row that happens to have been assigned a different catalog ID. Under duplicate-aware scoring, the gap reverses to a +1.71 improvement.
5. Discussion
5.1. Synthesis of Findings
5.2. Implications for Accounting Practice and Internal Controls
5.3. Limitations and Future Research
- Linguistic Scope and Mixed-Script Complexity: A primary limitation of this study is the linguistic homogeneity of the standard benchmarks (Abt-Buy, Amazon-Google, Walmart-Amazon), which are exclusively English. This contrasts with our target operational environment, which is characterized by high linguistic entropy. In our real-world use case, data is not simply “translated”; it involves complex code-switching, where invoices and catalog entries frequently mix Greek and English terms within the same line item (e.g., an English brand name paired with a Greek functional description, or mixed-script abbreviations). While the “Query Synthesizer” demonstrated robust handling of synonyms in the benchmarks, its primary value lies in its ability to normalize this Hybrid Greek–English input, a capability not fully quantified by the current English-only datasets.
- Enrichment Trade-offs: Our experiments revealed that the “augment-both-sides” strategy requires careful tuning. As observed in the Walmart-Amazon dataset, LLM-based catalog enrichment does not universally improve performance and can introduce noise (reducing R@1) in highly heterogeneous retail datasets. Future work should investigate governance mechanisms, such as confidence thresholds or “human-in-the-loop” review stages, to validate generated synonyms before they enter the vector index.
- End-to-End Latency and Behavioral Validation: The per-line latency reported in Section 5.4 was measured on individual benchmark invocations and establishes technical feasibility, but a formal evaluation under production load (concurrent invoice streams, API rate-limit saturation, and end-to-end queue throughput), together with a controlled behavioral study of AP-clerk verification accuracy and speed when supported by the system, were out of scope for this study. The productivity claims associated with the decision-support framing remain to be empirically validated in a controlled setting.
- Multimodal Integration: Our evaluation focused on textual line items. Emerging research in multimodal transformers suggests that incorporating visual layout features (e.g., the spatial coordinates of text on the invoice) could further enhance extraction and matching accuracy, particularly for invoices where the visual structure implies the category.
5.4. Benchmark Artifacts, Production Validation, and Operational Feasibility
5.5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1. Catalog Enricher (Phase 1—Offline)
- NEVER invent attributes not present in the input. Do not guess colors, sizes, capacities, or brands.
- Keep the brand EXACT if present; add common brand abbreviations only if widely used (e.g., “hewlett packard” → “hp”).
- Include synonyms for function/category if present (e.g., “headphones”, “headset”; “tv”, “television”).
- Add realistic invoice-style variations (e.g., abbreviations).
- Expand common abbreviations (e.g., “Tabl → tablets”, “Inj → injection”, “Amp → ampoule”), but keep both expanded and abbreviated forms when relevant.
- Keep all numeric attributes EXACT (capacity, size, version). Also add common unit variants (e.g., “gb” and “gbyte”). Expand or clarify where needed.
- No sentences, no marketing, no stopwords, no explanations. Keep each keyword short (≤4 words).
- Focus only on metadata that could plausibly exist for this product in catalog descriptions.
Appendix A.2. Query Synthesizer (Phase 2—Runtime)
- Do NOT invent attributes. Use ONLY info present in the input.
- Include every product/model code, sizes/dimensions, versions, and pack/count EXACTLY as they appear (when present). If absent, omit.
- Exclude invoice meta: lots, discounts, prices, VAT, order numbers, dates, ad-dresses, loyalty/offer text, etc.
- Always preserve original script, casing, and diacritics for brand/model tokens; if you add an expansion, KEEP the original too.
- Queries must be DENSE, INFORMATIVE (no minimal queries) and meaning-fully different (avoid trivial rephrasings).
- Category and form synonyms.
- Acronym expansions or contracted/long-form variants.
- Units/number/symbol/notation formatting variants found in input (e.g., “500 mg”/“500 mg”, “2 × 500 g”/“2 × 500 g”).
- Packaging synonyms (add 1–2 besides the original).”””
Appendix A.3. Evaluator/Reranker (Phase 2—Runtime, Final Stage)
- Exact matches of product type or name.
- Exact matches of brand name (brand text must match exactly if present).
- Exact matches of product code(s) (e.g., SKU, EAN, or internal codes).
- High textual similarity in product type or product name.
- Exact matches or compatible values for size/dimensions or pack/count.
- Other contextual or descriptive similarity.
- Always return the three best candidates, ranked from most to least relevant.
- If no exact matches exist, return the three closest ones based on partial or se-mantic similarity.
- If a candidate matches at least the product type or name, it is valid for rank-ing.
- Never fabricate or modify product text; use the catalog lines as provided.
- Focus on precision and meaningful relevance rather than sufficiency.”””
| 1 | The three-way match is the standard accounts-payable internal control requiring agreement between the purchase order, the goods-receipt note, and the supplier invoice on item, quantity, and price before payment is authorized. |
| 2 | The three benchmarks are obtained from the official DeepMatcher dataset index at https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md (accessed on 25 May 2026) (Mudgal et al., 2018). Specifically, we use the Structured/Walmart-Amazon, Structured/Amazon-Google and Textual/Abt-Buy entries from that page. |
References
- Abderrahman, A. S. M., & Makarem, N. (2026). The future of external audit: A systematic literature review of emerging technologies and their impact on external audit practices. Journal of Risk and Financial Management, 19(3), 216. [Google Scholar] [CrossRef]
- Adeyemi, M., Oladipo, A., Pradeep, R., & Lin, J. (2023). Zero-shot cross-lingual reranking with large language models for low-resource languages. arXiv. [Google Scholar] [CrossRef]
- Althaf, A. M., Mohammed, M. A., Milanova, M., Talburt, J., & Cakmak, M. C. (2025). Multi-agent RAG framework for entity resolution: Advancing beyond single-LLM approaches with specialized agent coordination. Computers, 14(12), 525. [Google Scholar] [CrossRef]
- Balsiger, D., Dimmler, H.-R., Egger-Horstmann, S., & Hanne, T. (2024). Assessing large language models used for extracting table information from annual financial reports. Computers, 13(10), 257. [Google Scholar] [CrossRef]
- Bardelli, C., Rondinelli, A., Vecchio, R., & Figini, S. (2020). Automatic electronic invoice classification using machine learning models. Machine Learning and Knowledge Extraction, 2(4), 617–629. [Google Scholar] [CrossRef]
- Bode, C., Burkhart, D., Schültken, R., & Vollmer, M. (2023). Future of procurement. In R. Merkert, & K. Hoberg (Eds.), Global logistics and supply chain strategies for the 2020s: Vital skills for the next generation (pp. 261–276). Springer International Publishing. [Google Scholar] [CrossRef]
- Chen, L.-C., Weng, H.-T., Pardeshi, M. S., Chen, C.-M., Sheu, R.-K., & Pai, K.-C. (2025). Evaluation of prompt engineering on the performance of a large language model in document information extraction. Electronics, 14(11), 2145. [Google Scholar] [CrossRef]
- Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003). A comparison of string distance metrics for name-matching tasks. IIWeb, 3, 73–78. Available online: https://www.bibsonomy.org/bibtex/b918a22c0ac156bcd7114e8361377773 (accessed on 25 May 2026).
- Cristani, M., Bertolaso, A., Scannapieco, S., & Tomazzoli, C. (2018). Future paradigms of automated processing of business documents. International Journal of Information Management, 40, 67–75. [Google Scholar] [CrossRef]
- Dang, Q.-V., Nguyen, N.-S.-A., & Vo, T.-B.-D. (2026). HierFinRAG—Hierarchical multimodal RAG for financial document understanding. Informatics, 13(2), 30. [Google Scholar] [CrossRef]
- Flechsig, C., Anslinger, F., & Lasch, R. (2022). Robotic Process Automation in purchasing and supply management: A multiple case study on potentials, barriers, and implementation. Journal of Purchasing and Supply Management, 28(1), 100718. [Google Scholar] [CrossRef]
- Gao, L., Ma, X., Lin, J., & Callan, J. (2023). Precise zero-shot dense retrieval without relevance labels. In Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long papers) (pp. 1762–1777). Association for Computational Linguistics. Available online: https://aclanthology.org/2023.acl-long.99/ (accessed on 25 May 2026).
- Grabski, S. V., Leech, S. A., & Schmidt, P. J. (2011). A review of ERP research: A future agenda for accounting information systems. Journal of Information Systems, 25(1), 37–78. Available online: https://publications.aaahq.org/jis/article-abstract/25/1/37/1563 (accessed on 25 May 2026). [CrossRef]
- Ha, H. T., & Horák, A. (2022). Information extraction from scanned invoice images using text analysis and layout features. Signal Processing: Image Communication, 102, 116601. [Google Scholar] [CrossRef]
- Huang, F., & Vasarhelyi, M. A. (2019). Applying robotic process automation (RPA) in auditing: A framework. International Journal of Accounting Information Systems, 35, 100433. [Google Scholar] [CrossRef]
- Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-training for document AI with unified text and image masking. arXiv. [Google Scholar] [CrossRef]
- Iaroshev, I., Pillai, R., Vaglietti, L., & Hanne, T. (2024). Evaluating retrieval-augmented generation models for financial report question and answering. Applied Sciences, 14(20), 9318. [Google Scholar] [CrossRef]
- James, A., Trovati, M., & Bolton, S. (2025). Retrieval-augmented generation to generate knowledge assets and creation of action drivers. Applied Sciences, 15(11), 6247. [Google Scholar] [CrossRef]
- Jeong, Y.-B., Seo, H., Kim, Y.-H., & Kim, W.-Y. (2025). Retrieval-augmented visual parcel invoice understanding transformer for address correction. Engineering Applications of Artificial Intelligence, 158, 111542. [Google Scholar] [CrossRef]
- Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., & Park, S. (2022). OCR-free document understanding transformer. arXiv. [Google Scholar] [CrossRef]
- Kim, J.-I., & Shunk, D. L. (2004). Matching indirect procurement process with different B2B e-procurement systems. Computers in Industry, 53(2), 153–164. [Google Scholar] [CrossRef]
- Köpcke, H., Thor, A., & Rahm, E. (2010). Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment, 3(1–2), 484–493. [Google Scholar] [CrossRef]
- Krieger, F., Drews, P., & Funk, B. (2023). Automated invoice processing: Machine learning-based information extraction for long tail suppliers. Intelligent Systems with Applications, 20, 200285. [Google Scholar] [CrossRef]
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th international conference on neural information processing systems, NIPS ’20 (pp. 9459–9474). ACM. [Google Scholar]
- Li, Y., Li, J., Suhara, Y., Doan, A., & Tan, W.-C. (2020). Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 14(1), 50–60. [Google Scholar] [CrossRef]
- Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. arXiv. [Google Scholar] [CrossRef]
- Luo, C., Shen, Y., Zhu, Z., Zheng, Q., Yu, Z., & Yao, C. (2024). LayoutLLM: Layout instruction tuning with large language models for document understanding. arXiv. [Google Scholar] [CrossRef]
- Luo, S., & Yu, J. (2024). SGFNet: A semantic graph-based multimodal network for financial invoice information extraction. Expert Systems with Applications, 258, 125156. [Google Scholar] [CrossRef]
- Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). Query rewriting in retrieval-augmented large language models. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 5303–5315). Association for Computational Linguistics. [Google Scholar] [CrossRef]
- Maurya, C. K., Gantayat, N., Dechu, S., & Horvath, T. (2020). Online similarity learning with feedback for invoice line item matching. arXiv. [Google Scholar] [CrossRef]
- Mehrbod, A., Zutshi, A., Grilo, A., & Jardim-Gonsalves, R. (2018). Application of a semantic product matching mechanism in open tendering e-marketplaces. Journal of Public Procurement, 18(1), 14–30. Available online: https://www.emerald.com/insight/content/doi/10.1108/jopp-03-2018-002/full/html (accessed on 25 May 2026). [CrossRef]
- Mistiawan, A., & Suhartono, D. (2024). Product matching with two-branch neural network embedding. Journal Européen Des Systèmes Automatisés, 57(4), 1207. [Google Scholar] [CrossRef]
- Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 international conference on management of data, SIGMOD ’18 (pp. 19–34). ACM. [Google Scholar] [CrossRef]
- Ng, K. K. H., Chen, C.-H., Lee, C. K. M., Jiao, J., & Yang, Z.-X. (2021). A systematic literature review on intelligent automation: Aligning concepts from theory, practice, and future perspectives. Advanced Engineering Informatics, 47, 101246. [Google Scholar] [CrossRef]
- Nigam, P., Song, Y., Mohan, V., Lakshman, V., Ding, W., Shingavi, A., Teo, C. H., Gu, H., & Yin, B. (2019). Semantic product search. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2876–2885). ACM. [Google Scholar] [CrossRef]
- O’Leary, D. E. (2000). Enterprise resource planning systems: Systems, life cycle, electronic commerce, and risk. Cambridge University Press. Available online: https://books.google.com/books?hl=en&lr=&id=7fzMFG-tCmkC&oi=fnd&pg=PP11&dq=Enterprise+Resource+Planning+Systems+O%27Leary,+D.+E.+(2000)&ots=9a4Vlr0Y9P&sig=dRJS6XswJReudxUtffBskikKTNA (accessed on 25 May 2026).
- OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2024). GPT-4 technical report. arXiv. [Google Scholar] [CrossRef]
- Palm, R. B., Winther, O., & Laws, F. (2017). CloudScan—A configuration-free invoice analysis system using recurrent neural networks. arXiv. [Google Scholar] [CrossRef]
- Peeters, R., & Bizer, C. (2022). Supervised contrastive learning for product matching. In Companion proceedings of the web conference 2022, WWW ’22 (pp. 248–251). ACM. [Google Scholar] [CrossRef]
- Peeters, R., Steiner, A., & Bizer, C. (2024). Entity matching using large language models. arXiv. [Google Scholar] [CrossRef]
- Raina, V., & Gales, M. (2024, May 20). Question-based retrieval using atomic units for enterprise RAG. arXiv. [CrossRef]
- Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. Nist Special Publication Sp, 109, 109. Available online: https://books.google.com/books?hl=en&lr=&id=j-NeLkWNpMoC&oi=fnd&pg=PA109&dq=Okapi+at+TREC-3&ots=YkE6HhAsME&sig=kDgCD0Ysml73EXihKaq8_229ZBQ (accessed on 25 May 2026).
- Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. [Google Scholar] [CrossRef]
- Schlegel, D., Fundanovic, O., & Kraus, P. (2024). Rating risks in robotic process automation (RPA) projects: An expert assessment using an impact-uncontrollability matrix. Procedia Computer Science, 239, 185–192. [Google Scholar] [CrossRef]
- Strohmer, M. F., Easton, S., Eisenhut, M., Epstein, E., Kromoser, R., Peterson, E. R., & Rizzon, E. (2020). Digital in procurement. In M. F. Strohmer, S. Easton, M. Eisenhut, E. Epstein, R. Kromoser, E. R. Peterson, & E. Rizzon (Eds.), Disruptive procurement: Winning in a digital world (pp. 49–76). Springer International Publishing. [Google Scholar] [CrossRef]
- Šimsa, Š., Šulc, M., Uřičář, M., Patel, Y., Hamdi, A., Kocián, M., Skalický, M., Matas, J., Doucet, A., Coustaty, M., & Karatzas, D. (2023). DocILE benchmark for document information localization and extraction. arXiv. [Google Scholar] [CrossRef]
- Tang, G., Xie, L., Jin, L., Wang, J., Chen, J., Xu, Z., Wang, Q., Wu, Y., & Li, H. (2021). MatchVIE: Exploiting match relevancy between entities for visual information extraction. arXiv. [Google Scholar] [CrossRef]
- Tater, T., Gantayat, N., Dechu, S., Jagirdar, H., Rawat, H., Guptha, M., Gupta, S., Strak, L., Kiran, S., & Narayanan, S. (2022). AI driven accounts payable transformation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12405–12413. [Google Scholar] [CrossRef]
- Tiwari, A. K., Marak, Z. R., Paul, J., & Deshpande, A. P. (2023). Determinants of electronic invoicing technology adoption: Toward managing business information system transformation. Journal of Innovation & Knowledge, 8(3), 100366. [Google Scholar] [CrossRef]
- Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of the ACM, 21(1), 168–173. [Google Scholar] [CrossRef]
- Wang, T., Chen, X., Lin, H., Chen, X., Han, X., Wang, H., Zeng, Z., & Sun, L. (2024). Match, compare, or select? An investigation of large language models for entity matching. arXiv. [Google Scholar] [CrossRef]
- Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1192–1200). ACM. [Google Scholar] [CrossRef]
- Zeakis, A., Papadakis, G., Skoutas, D., & Koubarakis, M. (2023). Pre-trained embeddings for entity resolution: An experimental analysis. Proceedings of the VLDB Endowment, 16(9), 2225–2238. [Google Scholar] [CrossRef]
- Zhang, J., Fang, J., Zhang, C., Zhang, W., Ren, H., & Xu, L. (2025). Geographic named entity matching and evaluation recommendation using multi-objective tasks: A study integrating a large language model (LLM) and retrieval-augmented generation (RAG). ISPRS International Journal of Geo-Information, 14(3), 95. [Google Scholar] [CrossRef]


| Dimension | Edit-Distance (Levenshtein/Jaro-Winkler) | TF-IDF/BM25 | Dense Embeddings | Hybrid (Dense + Sparse) | Proposed: Dual-Augmentation RAG |
|---|---|---|---|---|---|
| OCR/character noise | Handles minor typos only | Weak unless fuzzy/character analyzers are used | Variable | Moderate | Strong if rewrite validation and identifier protection are used |
| Synonyms | No | No natively; possible with expansion | Generally strong | Strong | Strong, with catalog and query augmentation |
| Abbreviations | Limited | Requires dictionary | Limited/domain-dependent | Moderate | Strong if expansions are validated |
| Cross-lingual/mixed script | No | Requires multilingual analyzer | Encoder-dependent | Encoder-dependent | Strong on a multilingual retrieval stack |
| SKU/model-code preservation | Strong | Strong | Weak | Strong | Strong with SKU locking/validation |
| Query-time cost | Very low, but poor scalability | Low | Medium | Medium-high | Highest if rewrite/reranking are online |
| Auditability | High | High | Low | Medium | Medium-high with full logging |
| Main failure mode | Synonyms/abbreviations | Vocabulary mismatch | Semantic drift/code loss | Mixed failure modes | Hallucinated expansions, code corruption, reranker bias |
| Best suited for | Small clean catalogs | Code/keyword-heavy catalogs | Clean descriptive catalogs | General production baseline | Noisy multilingual AP/product data |
| Dataset | Domain | Catalog Size (Rows) | Evaluated Queries | Catalog Construction Strategy (Preprocessing) |
|---|---|---|---|---|
| Amazon-Google | Software and Tech | 3226 | 1167 | Composite Field Construction: We concatenated the Title and Manufacturer fields. The logic explicitly checks if the manufacturer is already present in the title; if not, it is appended to ensure the embedding captures the brand identity, which is critical for matching software licenses. |
| Abt-Buy | Consumer Electronics | 1092 | 1028 | Single Field Indexing: This dataset contained high-quality, descriptive Names. We indexed the Name field directly as it contained sufficient signal (Brand + Model + Spec) to distinguish SKUs without additional concatenation. |
| Walmart-Amazon | General Retail | 22,074 | 962 | Conditional Attribute Injection: This was the most heterogeneous dataset. We implemented a conditional logic that analyzed the Title. If key attributes like Brand or Model Number were missing from the title string, they were injected from their respective columns. This mirrors the “data cleansing” phase often required in ERP migrations. |
| Dataset | Method | Metric | Raw Catalog | Enriched Catalog | Proposed System | Proposed (Dup-Aware) | Gap |
|---|---|---|---|---|---|---|---|
| Amazon-Google | Dense | R@1 | 68.38% | 70.01% | |||
| Sparse | R@1 | 65.98% | 63.58% | ||||
| Hybrid | R@1 | 68.98% | 68.04% | 71.89% [69.32%, 74.30%] | 73.78% [71.12%, 76.26%] | +1.89% | |
| Dense | R@3 | 90.75% | 92.63% | ||||
| Sparse | R@3 | 86.46% | 86.29% | ||||
| Hybrid | R@3 | 90.83% | 92.12% | 91.60% [89.97%, 93.14%] | 94.34% [92.97%, 95.72%] | +2.74% | |
| Abt-Buy | Dense | R@1 | 86.67% | 85.90% | |||
| Sparse | R@1 | 68.39% | 70.14% | ||||
| Hybrid | R@1 | 80.64% | 79.38% | 93.97% [92.41%, 95.43%] | 94.94% [93.58%, 96.30%] | +0.97% | |
| Dense | R@3 | 96.01% | 95.91% | ||||
| Sparse | R@3 | 84.24% | 86.58% | ||||
| Hybrid | R@3 | 92.41% | 95.33% | 97.96% [96.98%, 98.74%] | 98.93% [98.25%, 99.51%] | +0.97% | |
| Walmart-Amazon | Dense | R@1 | 77.44% | 75.47% | |||
| Sparse | R@1 | 77.03% | 74.84% | ||||
| Hybrid | R@1 | 80.67% | 75.68% | 83.47% [81.19%, 85.86%] | 84.10% [81.60%, 86.17%] | +0.63% | |
| Dense | R@3 | 94.38% | 93.76% | ||||
| Sparse | R@3 | 94.59% | 93.76% | ||||
| Hybrid | R@3 | 96.57% | 95.74% | 97.30% [96.15%, 98.23%] | 97.92% [96.99%, 98.75%] | +0.62% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Dadopoulos, M.; Moschidis, S. Beyond Fuzzy Matching: A Dual-Augmentation RAG System for Robust Product Reconciliation in Accounting. J. Risk Financial Manag. 2026, 19, 402. https://doi.org/10.3390/jrfm19060402
Dadopoulos M, Moschidis S. Beyond Fuzzy Matching: A Dual-Augmentation RAG System for Robust Product Reconciliation in Accounting. Journal of Risk and Financial Management. 2026; 19(6):402. https://doi.org/10.3390/jrfm19060402
Chicago/Turabian StyleDadopoulos, Michail, and Stratos Moschidis. 2026. "Beyond Fuzzy Matching: A Dual-Augmentation RAG System for Robust Product Reconciliation in Accounting" Journal of Risk and Financial Management 19, no. 6: 402. https://doi.org/10.3390/jrfm19060402
APA StyleDadopoulos, M., & Moschidis, S. (2026). Beyond Fuzzy Matching: A Dual-Augmentation RAG System for Robust Product Reconciliation in Accounting. Journal of Risk and Financial Management, 19(6), 402. https://doi.org/10.3390/jrfm19060402

