SMSProcessing Using Optical Character Recognition for Smishing Detection †
Abstract
1. Introduction
2. Materials and Methods
- Morphological operations. This analyzes the shape and structure of objects in images using set theory, random functions, and lattice algebra to identify techniques such as dilation, erosion, opening, and closing [11].
- Character recognition. Models based on neural networks can be used for character recognition. Tesseract’s OCR engine (v. 5.5.0.20241111) uses LSTM (Long Short-Term Memory) models to recognize entire lines of text using recurrent neural networks [12].
3. Results
- Background detection. Background detection. A fixed threshold is used in screenshots of messages with a light background because they have a high contrast between the text and the background, which facilitates the binarization process. In images with a dark background, the contrast is lower, so it is necessary to invert colors using the ‘cv2.bitwise_not()’ function. To identify whether a theme is light or dark, the average brightness of the image pixels is calculated with ‘np.mean()’ using the ‘NumPy’ library, and a threshold of 57 is set. If this threshold is exceeded, the background is identified as dark.
- Binarization. When a dark background is detected, an adaptive threshold is applied with ‘cv2.adaptiveThreshold()’ to the inverted grayscale image to obtain a binarized image. If a light background is detected, a fixed threshold of 127 is applied to the grayscale image (Figure 2C) using the ‘cv2.threshold()’ function, and the binarized image is obtained (Figure 2D).
- Extracting text from the binarized image. The function ‘pytesseract.image_to_string()’ is applied to the binarized image to extract visible text from the image and return it as a text string, as shown in Figure 3.
- Sanitization of extracted text. The text extracted from the image is sanitized for further analysis. The text is converted to lowercase using ‘lower()’ and line breaks and extra spaces are removed using ‘replace()’ and ‘strip()’. The transcription of the text is shown in Figure 4.
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Klaviyo. Campaign SMS and MMS Benchmarks. 2024. Available online: https://help.klaviyo.com/hc/en-us/articles/360051110111 (accessed on 25 September 2024).
- Martínez Santander, C.J.; Cruz Gavilanes, Y.N.; Cruz Gavilanes, T.M.; Álvarez Lozano, M.I. Layered security to stop smishing attacks. Dominio Las Cienc. 2018, 4, 115–130. [Google Scholar]
- Statista. Mexico: Percentage of Users by Social Network 2023. 2024. Available online: https://es.statista.com/estadisticas/1035031/mexico-porcentaje-de-usuarios-por-red-social/ (accessed on 25 September 2024).
- National Commission for the Protection and Defense of Financial Services Users (CONDUSEF). Impersonating Financial Institutions to Commit Fraud. 2024. Available online: https://www.condusef.gob.mx/?p=contenido&idc=2534&idcat=1 (accessed on 25 September 2024).
- Fernández, S.; Javier, C.; Consuegra, V.S. Optical character recognition (OCR). Univ. Carlo 2008, 3, 2008. [Google Scholar]
- Rajmod, V.; Derkar, G.; Nagrale, P.; Awari, N.; Lokhande, M.P. Text Extraction from Image Using OCR. In Proceedings of the 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), Goathgaun, Nepal, 7–8 January 2025; pp. 113–116. [Google Scholar] [CrossRef]
- Medina, P.B.; Carofilis, A.; Fidalgo, E.; Alegre, E. Image preprocessing and OCR to improve smishing detection (Preprocesado de imagen y OCR para mejorar deteccion de smishing). Jorn. AutomáTica 2024, 45, 10955. (In Spanish) [Google Scholar] [CrossRef]
- Busa, R.; Shahira, K.C.; Lijiya, A. Small Text Extraction from Documents and Chart Images. In Proceedings of the 2022 IEEE 19th India Council International Conference (INDICON), Kochi, India, 24–26 November 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 3rd ed.; Prentice Hall: Hoboken, NJ, USA, 2008. [Google Scholar]
- Sezgin, M.; Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004, 13, 146–165. [Google Scholar] [CrossRef]
- Serra, J. Image Analysis and Mathematical Morphology; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
- Smith, R. An overview of the Tesseract OCR engine. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar]
- Archana, D.; Deepak, K.; Lokesh, D.K.S.; Sridharan, S.N.; Vasanth, G.; Sriram, N.S.; Prawin, B.K.S. Image Text Detection and Documentation Using OCR. In Proceedings of the 2024 International Conference on Smart Systems for Electrical, Electronics, Communication and Computer Engineering (ICSSEECC), Coimbatore, India, 28–29 June 2024; pp. 410–414. [Google Scholar] [CrossRef]




Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Prudente-Tixteco, L.; Olivares-Mercado, J.; Toscano-Medina, L.K. SMSProcessing Using Optical Character Recognition for Smishing Detection. Eng. Proc. 2026, 123, 12. https://doi.org/10.3390/engproc2026123012
Prudente-Tixteco L, Olivares-Mercado J, Toscano-Medina LK. SMSProcessing Using Optical Character Recognition for Smishing Detection. Engineering Proceedings. 2026; 123(1):12. https://doi.org/10.3390/engproc2026123012
Chicago/Turabian StylePrudente-Tixteco, Lidia, Jesus Olivares-Mercado, and Linda Karina Toscano-Medina. 2026. "SMSProcessing Using Optical Character Recognition for Smishing Detection" Engineering Proceedings 123, no. 1: 12. https://doi.org/10.3390/engproc2026123012
APA StylePrudente-Tixteco, L., Olivares-Mercado, J., & Toscano-Medina, L. K. (2026). SMSProcessing Using Optical Character Recognition for Smishing Detection. Engineering Proceedings, 123(1), 12. https://doi.org/10.3390/engproc2026123012

