This paper presents the processing steps needed in order to have a fully functional vertical search engine. Four actions are identified (i.e., retrieval, extraction, presentation, and delivery) and are required to crawl websites, get the product information from the retrieved webpages, process that data, and offer the end-user the possibility of looking for various products. The whole application flow is focused on low resource usage, and especially on the delivery action, which consists of a web application that uses cloud resources and is optimized for cost efficiency. Novel methods for representing the crawl and extraction template, for product index optimizations, and for deploying and storing data in the cloud database are identified and explained. In addition, key aspects are discussed regarding ethics and security in the proposed solution. A practical use-case scenario is also presented, where products are extracted from seven online board and card game retailers. Finally, the potential of the proposed solution is discussed in terms of researching new methods for improving various aspects of the proposed solution in order to increase cost efficiency and scalability.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited