1. Introduction
In recent decades, the use of mobile devices has transcended the initial objective of facilitating wireless communication to become a tool used for other tasks such as the search for information, the development of social life, economic transactions, or consumption. For these tasks, different applications can be installed on the devices and that makes it easier for the user to carry out these types of tasks.
Different reasons explain the success and rapid expansion of mobiles as a tool to access different services than wireless communication itself [
1]. One of these reasons is the ability to use any of these services anywhere and at any time. This flexibility is more complicated with other devices such as if you use a laptop or a conventional computer. Likewise, it is necessary to highlight the immediacy to access the information in real-time, or the simplicity of knowledge to be able to use this type of application [
2].
There are a wide variety of services [
3] that can be accessed through mobile applications. A type of service that is repeated in many areas is the search for a consumer item to acquire. For example, some apps allow the purchase of music, food, clothing, and other utensils. In all of them, user interaction is similar [
4]. The application offers a means to search for the item, and if it is found in the catalog, then there is the possibility of purchasing it online. A common characteristic in this type of interaction is that the user knows what he is looking for [
5]. For example, for the purchase of books, there are applications such as Quiero Libros [
6] or My Library [
7] that allow the purchase and sale of books, the registration of new books by scanning the barcode or ISBN code [
8], the search for books by title, author or ISBN as well as its ordering or classification, or the creation of personal libraries, or more powerful platforms such as Google Books that, in addition to services similar to the previous ones, allows more sophisticated searches with a greater number of filters, and information and details of it are displayed for each work (and in many of them a preview of its content or even, in some cases, the possibility of reading it in its entirety and downloading it, as well as links to platforms where they can be purchased), and provides an API that can be used to send requests with search criteria and receive the search result in JSON format.
However, a different situation is one in which the user does not know what he is looking for. For example, when he hears a song that he would like to purchase but does not know the musician or the name of the song. This situation is repeated with other elements, such as clothing, food, and others. For these cases, it would be interesting to have a mechanism that allows a reverse search, in which the content is known but the representative of the content is not. A search mechanism of these characteristics requires the ability to recognize the content to find the representative, and it will depend on the type of content. It does not have the same difficulty and the same technological requirements to recognize a song as it does to recognize a type of clothing or food. For example, book recognition could be implemented in various ways. One way would be through the textual content of the book [
9], for which it would be necessary to take some text from it and carry out a search on book texts until the match was found. The main problem with this form of search is that it is computationally very complex compared to other types of searches [
10]. Another more appropriate search method for this case is to perform a search using an image of the book cover [
11]. The image would be used to perform recognition of the text that is in the image and extract it to implement the search by title, author, and any other data that may facilitate its location. Computationally, this form of search is more efficient than the previous one [
12].
In this article, a mobile application is presented that implements a value-added service that allows the identification of books from the cover image, and offers as a result the book that has been recognized and stores in which they can be purchased.
To implement an application, an essential component is the use of an application that is capable of recognizing text in an image. Applications that allow you to recognize text in images are called OCR (Optical Character Recognition). Some of its main characteristics are the ability to obtain data such as chromatic characteristics, the relationship between an image and geographic data, to know what objects are in it, to know the context that is perceived in the image, or to discover if it appears as text in the image and knows what it is, where it is, or what it means. Its implementation is based on the use of machine learning algorithms that allow a program to be able to recognize the existence or not of certain elements in an image. To do this, a learning process is followed that consists of training the algorithm with a large number of images in which the elements to be recognized are present, whether they are objects, colors, or letters, indicating what they are and where they are. The algorithm learns to recognize what these elements are, and later, it can detect them in new images. Currently, there are online applications that offer this type of recognition service through APIs. In this sense, the service consists of sending an image to the application through the API, receiving the object that has been recognized as a response. Some examples of these types of applications are the Google Cloud Vision API [
13] that detects objects and faces, reads printed and hand-written text, classifies them into predefined categories, and assigns a set of metadata to images; Microsoft’s Computer Vision is a set of 3 APIs that provide services to detect and extract handwritten or printed text that appears in images (Read API detects the text content of an image and converts the identified text into a sequence of characters readable by a machine being optimized for images with large amounts of text; OCR API that works similar to the previous one but runs synchronously and is not optimized for large documents; and Recognize Text API that works similar to OCR but that runs asynchronously and uses updated recognition models); Amazon’s Amazon Recognition [
14] facilitates image and video analysis with deep learning technology being able to identify objects, people, text, scenes and activities in images and videos, in addition to detecting any inappropriate content and providing high precision facial analysis and capabilities of face search to detect, analyze and compare faces; or the open source software Tesseract [
15] that supports various image formats and is capable of reading multipage documents. The possibilities offered by text recognition are very varied and applicable to different areas. For example, in the medical field, it facilitates the digital transition from handwritten medical reports to electronic records, such as the Savana system [
16], in the field of circular economy to retrieve knowledge from patents on how to recycle and reuse a waste [
17], in the field of forensic analysis to associate the authorship of documents with people [
18], or in the field of digital letters to digitize analog books or documents [
19]. In some cases, it becomes a key piece to preserve analog knowledge and in others to obtain value by automating tasks.
A type of application similar in objectives to the one proposed in this article are price comparators such as Google Shopping [
20] that allow the price comparison of products similar to one provided by the user. As a result, a set of products ordered by price is obtained, making it possible to consult each product’s description, photos, or reviews and information on the prices of different sellers. Other similar apps are Idealo or Kelkoo. Idealo [
21] allows the comparison of prices of products from Amazon, eBay, and other famous marketplaces using the name of the desired product as an input element and obtaining, as a result, a set of products together with their price, the description of the product, the number of offers they have, or reviews made to the products. Kelkoo [
22] is another price comparator that allows the user to see the evolution of prices over time, and to set alerts to purchase the product when the price is lower. Finally, an application similar to the one described in this article is ASOS [
23]. It is an online search engine specialized in clothing. It allows you to search for a product based on its image and display all the items that match the search, being able to compare between the different sellers and find the product that best suits the customer’s wishes.
The article is structured as follows. In
Section 2 the materials and methods of the application are presented. Next in
Section 3, the results are described. Next,
Section 4 presents the conclusions and a set of lines of future work.
2. Materials and Methods
The main objective of this application is to offer a simple, fast and intuitive search service for books that provides the most relevant information about them and offers a set of value-added services compared to traditional search engines. This objective is specified in the following more specific objectives:
Provide a simple search engine for books based on title and author.
Offer a search engine that takes the image of the book cover as input.
Show a list with the possible results to choose the exact book that the user refers to see relevant information about the book at a glance, such as a title, author, description, publication date, publisher, number of pages, or its ISBN.
Offer the possibility of reading a preview or the entire book online.
Download the book in digital format.
See book purchase links from the most common sellers.
Have a favorites section.
For this, the development of a mobile application has been proposed that allows photographing the cover of a book and automatically shows all the information related to that work, as well as a preview of it, links to the main digital stores, and even the possibility to read it in its entirety online or download it in PDF or EPUB format to read in an e-book, as well as other functions such as adding titles to a favorites list.
With respect to the implementation of the application. It has been carried out using a client-server architecture (see
Figure 1) where the client is the multiplatform mobile application and the server provides the client with the data corresponding to the books and their searches, as well as the rest of the services.
The mobile application has been implemented in C # classes using the Xamarin platform [
24], and the server, which corresponds to all the processing of requests, searches, and communication with the database, has been implemented in PHP files on an Apache server.
On the other hand, Google Cloud Platform [
25] is a platform that facilitates access to numerous paid or free APIs to perform tasks related to computing, storage, and databases, data analysis, machine learning, administration tools, or managed infrastructure. In this project, the Vision API is used that offers machine learning models in predefined categories as well as object and face detection, identification of printed and handwritten text, or metadata mapping. To use the API, an image of the searched book must be provided, which is processed and relevant information is obtained from the image such as the text that appears in it, which identifies the author and the title of the work.
Selenium [
26] is an API that is capable of automating tasks in a browser such as processes, queries, and other actions. In this sense, using this API it is possible to search for books in different stores. For each store, a request is made to the Google search engine whose request incorporates as keywords the name of the store, the title of the book, and its author. For each of the searches, the results obtained are consulted by examining its URL to check if it is really the website on which the requested copy is advertised or, on the contrary, the result is not the desired one. When the verified results corresponding to the web pages where the book is advertised are obtained, they are transformed into a JSON object that will be returned to the application so that the user can access those links.
Finally, the Google Books API [
27] is used, which allows automating most of the operations that can be performed interactively on the Google Books website, such as performing full-text searches and retrieving information about books, visibility, and availability of e-books, as well as managing personal bookshelves. In the project, the author and title extracted from the user-supplied image are sent to the API to find books that match the supplied data. The API returns the books that match the search by providing data such as a description, the publisher, the number of pages, or the link to an online reader where a fragment or, sometimes, the entire book can be read. This data will be displayed in the application as a search result.
This architecture offers the following advantages:
Server-centric administration: the client has minimal administration needs.
Centralization of the resources: the resources of all the users are in a single server, thus avoiding the inconsistency and redundancy of the databases.
Improved security: the chances of improper access are reduced, as there is a centralized authentication mechanism.
Scalability of the installation: the network and its operation are not affected by the incorporation or elimination of users.
4. Conclusions
In this project, a multiplatform mobile application has been described that implements a value-added service combining different existing services for the identification and search of information about books using an image of the book cover. It also offers information about stores where you can buy the books you are looking for. The application has been developed using Microsoft’s Xamarin platform as well as the Vision API of Google Cloud Platform that allows the recognition of text on images of books and the Google Books API that allows the search of books. Finally, it has been possible to offer users different purchase links for books using the Selenium WebDriver API, which automates Google searches by consulting the sale link of a specific book in the main online stores. The originality of the application is not in the recognition of the images, since it is a problem that is beyond the scope of the project. However, what is original about this application is the value-added service that it implements by combining different functional components, to obtain a service that goes beyond the originals. On the other hand, the architecture of the application allows its extensibility to incorporate more services in a simple way.
Regarding future lines of work, the application can be improved by implementing new functions such as a password reset function in case the user forgets it, a more attractive and modern user interface, extending the usage statistics offered by the application, expand the history where the user can see the last books he has visited with other interesting points such as the most consulted author, a record of the books that have been downloaded, or the number of books that have been consulted in a period of time specified. Likewise, the favorites section could be improved by adding the possibility of organizing books in collections, in such a way that the user could have collections of books organized by genre, author, extension, or those he has already read. Finally, it could add functionality to recommend books similar to those you are consulting, copies that are also visited by other users with similar tastes, or which are the most visited works. To do this, machine learning algorithms would be used on the information collected about which books each user visits completely anonymously. Likewise, another possible application would be to use bibliometric analyzes [
28] about the books that have been recovered or that have been most searched so that trend lists can be created on books and offered to recommend books to users. Finally, comment that no measures have been taken about the performance of the application in the sense that is discussed in [
29,
30] since the recognition functionality is a component external to the application that is simply used to combine it with others and therefore it is not possible to modify their behavior or improve it. However, as future work, it is proposed to use machine learning algorithms to create recommendations for users. In this context, a study will be carried out on the performance of the application. Likewise, it is proposed as future work, to carry out an analysis of the performance of the application with respect to the number of users that it is capable of supporting or the speed of response.