Electronic invoicing has been mandatory for Italian companies since January 2019. All the invoices are structured in a predefined xml template which facilitates the extraction of the information. The main aim of this paper is to exploit the information contained in electronic invoices to build an intelligent system which can simplify accountants’ work. More precisely, this contribution shows how it is possible to automate part of the accounting process: all the invoices of a company are classified into specific codes which represent the economic nature of the financial transactions. To accomplish this classification task, a multiclass classification algorithm is proposed to predict two different target variables, the account and the VAT codes, which are part of the general ledger entry. To apply this model to real datasets, a multi-step procedure is proposed: first, a matching algorithm is used for the reconstruction of the training set, then input data are elaborated and prepared for the training phase, and finally a classification algorithm is trained. Different classification algorithms are compared in terms of prediction accuracy, including ensemble models and neural networks. The models under comparison show optimal results in the prediction of the target variables, meaning that machine learning classifiers succeed in translating the complex rules of the accounting process into an automated model. A final study suggests that best performances can be achieved considering the hierarchical structure of the account codes, splitting the classification task into smaller sub-problems.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited