# Tax Fraud Detection through Neural Networks: An Application Using a Sample of Personal Income Taxpayers

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background and Methodological Framework

- $x=(1,{x}_{1},{x}_{2},\dots ,{x}_{r}{)}^{\prime}$ are the network inputs (independent variables), where 1 corresponds to the bias of a traditional model.
- ${\gamma}_{j}=({\gamma}_{j0},{\gamma}_{j1},\dots ,{\gamma}_{ji},\dots ,{\gamma}_{jr}{)}^{\prime}\in {\Re}^{r+1}$ are the weights of the inputs layer neurons to those of the intermediate or hidden layer.
- ${\beta}_{j},j=0,\dots ,q$, represents the connection force of the hidden units to those of pertaining to output ($j=0$ indexes the bias unit) and q is the number of intermediate units, that is, the number of hidden layer nodes.
- W is a vector which includes all the synaptic weights of the network, ${\gamma}_{j}$ and ${\beta}_{j}$, or connections pattern.
- Y = $\widehat{f}(x,W)$ is the network output (in our case, it refers to fraud probability)
- F: ℜ → ℜ is the unit activation function and output while G: ℜ → ℜ corresponds to the intermediate neurons activation function. Selection of both was considered optimum, in accordance with the software utilized (It is normal to use the sigmoid or logistic function G(a) = 1/(1 + exp(-a)), which produces a smooth sigmoid response. Notwithstanding, it is possible to use the hyperbolic tangent function. In the expression $\widehat{f}(x,W)$ if we consider that $a={x}^{\prime}{\gamma}_{j}$, we find that G(${x}^{\prime}{\gamma}_{j}$) tallies with the binary logit model).

## 3. Tax Fraud Modeling with Neural Networks

#### 3.1. Data Matrix: IRPF Sample Provided by the IEF

#### 3.2. Conceptualization of the Model: Application of the Tax Fraud Detection Model to Income Tax Returns

#### 3.3. Dimension Adjustment: Reduction of the Dimension According to the Main Components

#### 3.3.1. Multilayer Perceptron Network Model Estimation and Diagnosis Phase

_{j}), while the last two columns estimate the synaptic weights of the hidden layer neurons in relation to the output layer (βj).

#### 3.3.2. Generalization: Calculation of Taxpayer Fraud Probabilities

## 4. Conclusions and Future Directions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Herwartz, H.; Sardá, J.; Theilen, B. Money demand and the shadow economy: Empirical evidence from OECD countries. Empir. Econ.
**2016**, 50, 1627–1645. [Google Scholar] [CrossRef] - Herwartz, H.; Schneider, F.; Tafenau, E. One share fits all? Regional variations in the extent of shadow economy in Europe. Reg. Stud.
**2015**, 49, 1575–1587. [Google Scholar] [CrossRef] - Schneider, F. Estimating the size of the shadow economies of highly-developed countries: Selected new results. DICE Rep.
**2016**, 14, 44–53. [Google Scholar] - Barrero, F.D.; Laborda, J.L.; Sauco, F.R. El Hueco Que Deja el Diablo: Una Estimación del Fraude en el IRPF con Microdatos Tributarios; EEE2014-01. Madrid, Spain, 2014. Available online: https://dialnet.unirioja.es/servlet/articulo?codigo=5188642 (accessed on 27 March 2019).
- Feld, L.; Schneider, F. Survey on the shadow economy and undeclared earnings in the OECD countries. German Econ. Rev.
**2010**, 11, 109–149. [Google Scholar] [CrossRef] - Barrero, F.D.; Laborda, J.L.; Sauco, F.R. Fraude en el IRPF por Fuentes de Renta, 2005–2008: Del Impuesto sintético al Impuesto Dual; EEE2015-14. Madrid, Spain, 2015. Available online: https://ideas.repec.org/p/fda/fdaeee/eee2015-14.html (accessed on 27 March 2019).
- Mendoza, J.P.; Welhouwer, J.L.; Kirchler, E. The backfiring effect of auditing on tax compliance. J. Econ. Psycol.
**2017**, 62, 284–294. [Google Scholar] [CrossRef] - Alm, J. Measuring, explaining, and controlling tax evasion: Lessons from theory, experiments, and field studies. Int. Tax Public Financ.
**2011**, 19, 54–77. [Google Scholar] [CrossRef] - Almunia, M.; Lopez-Rodríguez, D. The efficiency costs of tax enforcement: Evidence from a panel of Spanish Firms. MPRA Paper. 2012. Available online: https://mpra.ub.uni-muenchen.de/44153/ (accessed on 27 March 2019).
- Castellón, P.; Velásquez, J.D. Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Syst. Appl.
**2012**, 40, 1427–1436. [Google Scholar] [CrossRef] - Pérez, C.; Burgos, M.J.; Huete, S.; Gallego, C. La Muestra de Declarantes de IRPF 2009; Working Paper 11; Instituto de Estudios Fiscales: Madrid, Spain, 2012. [Google Scholar]
- Abdallah, A.; Mohd, A.M.; Anazida, Z. Fraud detection system: A survey. J. Netw. Comput. Appl.
**2016**, 68, 99–113. [Google Scholar] [CrossRef] - Anyaeche, C.O.; Ighravwe, D.E. Predicting performance measures using linear regression and neural network: A comparison. Afr. J. Eng. Res.
**2013**, 1, 84–89. [Google Scholar] - Dilek, A.; Caliskan, S. Comparison of prediction performances of artificial neural network (ANN) and Vector Autoregressive (VAR) Models by using macroeconomic variables of gold prices, Borsa Istanbul (BIST) 100 index and US Dollar-Turkish Lira (USD/TRY) exchange rates. Procedia Econ. Financ.
**2015**, 30, 3–14. [Google Scholar] - Tosun, E.; Aydin, K.; Bilgili, M. Comparison of linear regression and artificial neural network model of a diesel engine fueled with biodiesel-alcohol mixtures. Alex. Eng. J.
**2016**, 55, 3081–3089. [Google Scholar] [CrossRef][Green Version] - Pérez, C.; Villanueva, J.; Burgos, M.J.; Martín, R.; Rodríguez, L. La Muestra de IRPF de 2014: Descripción General y Principales Magnitudes; Working Paper 10; Instituto de Estudios Fiscales: Madrid, Spain, 2017. [Google Scholar]

Concept | Box |
---|---|

Earnings | |

Gross work income (monetary) | par1 |

Net work income | par15 |

Capital gains gross income | par29 + par45 |

Capital gains net income | par31 + par47 |

Deductible net capital gains | par35 + par50 |

Gross property income | par70 |

Capital gains net income | par75 |

Deductible net property income | par79 = par85 |

Total deduction net income from economic activities under direct evaluation scheme | par140 |

Net earnings from economic activities under objective evaluation scheme (except agricultural, livestock forestry activities). | Par170 |

Net earnings from crop, livestock and forestry activities under objective evaluation scheme | Par197 |

Capital gains and losses positive net balance | par450 + par457 |

Concept | Box |
---|---|

Minimums and Bases | |

General taxable base | par455 |

Savings taxable base | par465 |

Minimum personal and family, part of general applied | par680 |

Minimum personal and family, part of savings applied | par681 |

Liquidable general base levy on | par620 |

Taxable savings base on saving | par630 |

Quotas | |

Central government tax | par698 |

Regional government tax | par699 |

Central government net tax | par720 |

Regional government net tax | par721 |

Self-assessment tax liability | par741 |

Tax payable | par755 |

Tax return balance | Par760 |

Parameter Estimate | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Predictor | Predicted | |||||||||

Hidden Layer 1 | Output Layer | |||||||||

H(1:1) | H(1:2) | H(1:3) | H(1:4) | H(1:5) | H(1:6) | H(1:7) | [marca = 0] | [marca = 1] | ||

Input Layer | (Bias) | −0.274 | 1.639 | −0.167 | −0.122 | −0.834 | 0.954 | 0.306 | ||

FAC1_1 | 1.173 | 0.029 | −0.794 | −1.110 | −0.988 | −2.318 | 1.828 | |||

FAC2_1 | 0.187 | −0.319 | 0.649 | −0.104 | −0.278 | 0.508 | 0.430 | |||

FAC3_1 | −0.035 | 0.488 | −0.210 | −0.490 | −0.437 | 0.713 | −0.514 | |||

FAC4_1 | 1.496 | 0.536 | −1.207 | −1.885 | −0.466 | 0.200 | 1.700 | |||

FAC5_1 | −0.101 | 0.157 | −0.052 | 0.284 | −0.298 | −0.330 | −0.074 | |||

FAC6_1 | −0.098 | 0.467 | −0.083 | 0.416 | 0.671 | −0.892 | 0.898 | |||

FAC7_1 | 4.657 | −0.763 | 0.750 | 1.847 | −0.314 | 0.072 | 2.417 | |||

FAC8_1 | 0.289 | 0.889 | −0.300 | −0.342 | −0.429 | −0.245 | −0.697 | |||

FAC9_1 | 1.272 | 1.280 | 1.850 | −1.144 | 1.084 | 0.195 | −4.070 | |||

FAC10_1 | 0.401 | 0.148 | −0.022 | 0.181 | −0.437 | 0.436 | −0.833 | |||

FAC11_1 | −0.541 | 0.164 | −1.281 | 0.343 | 0.087 | 0.682 | 0.767 | |||

Hidden Layer 1 | (Bias) | −2.256 | 1.514 | |||||||

H(1:1) | −2.131 | 2.104 | ||||||||

H(1:2) | −1.212 | 1.280 | ||||||||

H(1:3) | 0.633 | −0.874 | ||||||||

H(1:4) | 1.102 | 1.594 | ||||||||

H(1:5) | 0.911 | −0.773 | ||||||||

H(1:6) | 0.995 | −0.860 | ||||||||

H(1:7) | −1.894 | 1.579 |

Classification | ||||
---|---|---|---|---|

Sample | Observed | Predicted | ||

0 | 1 | Percent Correct | ||

Training | 0 | 441,525 | 63,016 | 87.5% |

1 | 150,694 | 695,739 | 82.2% | |

Overall Percent | 43.8% | 56.2% | 84.2% | |

Testing | 0 | 188,867 | 26,963 | 87.5% |

1 | 64,279 | 297,411 | 82.2% | |

Overall Percent | 43.8% | 56.2% | 84.2% |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pérez López, C.; Delgado Rodríguez, M.J.; de Lucas Santos, S.
Tax Fraud Detection through Neural Networks: An Application Using a Sample of Personal Income Taxpayers. *Future Internet* **2019**, *11*, 86.
https://doi.org/10.3390/fi11040086

**AMA Style**

Pérez López C, Delgado Rodríguez MJ, de Lucas Santos S.
Tax Fraud Detection through Neural Networks: An Application Using a Sample of Personal Income Taxpayers. *Future Internet*. 2019; 11(4):86.
https://doi.org/10.3390/fi11040086

**Chicago/Turabian Style**

Pérez López, César, María Jesús Delgado Rodríguez, and Sonia de Lucas Santos.
2019. "Tax Fraud Detection through Neural Networks: An Application Using a Sample of Personal Income Taxpayers" *Future Internet* 11, no. 4: 86.
https://doi.org/10.3390/fi11040086