Agentic Search Engine for Real-Time Internet of Things Data †
Abstract
1. Introduction
2. Background and Related Work
2.1. SensorsConnect
SensorsConnect Architecture
- A real-time database that stores the most recent updates from IoT devices.
- A historical database that accumulates time-series data.
- A cache server that retains frequently accessed data to reduce retrieval latency and optimize resource usage.
- A master management console for device access across all layers,
- An analytics dashboard for device monitoring and vulnerability detection,
- A business interface for integrating third-party services (e.g., trade transactions or parking reservations), and
- A pricing tool that suggests optimal pricing schemes based on business requirements.
2.2. Related Work
2.2.1. Large Language Model
- Generating human-like responses that are not explicitly predefined,
- Solving general-purpose tasks such as code generation,
- Following user instructions for novel and complex tasks,
- Executing multi-step logical reasoning when required, and
- Learning from examples provided within the user prompt (few-shot learning).
- LLMs may hallucinate: Trained on a mix of accurate and inaccurate data, they can generate confidently incorrect responses;
- LLMs are memoryless: They lack persistent memory and cannot recall previous user inputs across prompts in a session;
- LLMs perform poorly on long-tail queries: While effective on common knowledge, their accuracy diminishes for niche or specialized topics;
- LLMs cannot access real-time or external data: They are trained on historical datasets and have no access to live sources, such as real-time data from IoT devices in SensorsConnect. Consequently, they are unaware of the current time or any data not present in their training corpus.
2.2.2. Retrieval-Augmented Generation RAG
- Retriever—identifies semantically relevant document chunks from an external database based on the user query.
- Generator—combines the retrieved documents with the user input to produce accurate, contextually informed, and knowledge-grounded outputs.
2.2.3. Agentic LLM System
2.2.4. Sensors Data Retrieval Based on LLM
- A retriever built on heterogeneous IoT security datasets (e.g., vulnerability reports, threat feeds), which are preprocessed into vectorized, document-style chunks.
- An LLM (LLaMA3, LLaMA3.1, GPT 4o) layered atop this retriever, guided via domainspecific prompts and user context.
- DataKit, a toolkit that automates parsing and optimal chunking of diverse data formats.
- A deployed LLM that extracts sensor data presented in HTML format on web pages and converts it into a structured format such as CSV.
- A second LLM that generates word embedding representations of the extracted tabular data to support semantic search functionality.
- A real-time search engine for sensor data retrieval,
- A unified interface for IoT devices to mitigate heterogeneity, and
- An efficient data management approach that allows seamless storage by IoT devices and real-time retrieval by LLM agents.
3. System Architecture
- The Unified Data Model Figure 2—designed to address the heterogeneity of IoT data.
- IoT-RAG-SE Figure 3—responsible for processing IoT queries and retrieving real-time sensor data.
- GA-RAG Workflow Figure 4—defines a systematic approach for constructing agentic RAG-based systems.
- IoT-ASE Figure 5—an implementation of GA-RAG tailored for real-time IoT data environments.
3.1. Data Model
- Input—receives and processes sensor data,
- Output—manages actuations or device responses,
- Setting—configures operational parameters,
- Command—enables external control over devices, and
- State—communicates the device’s current status.
- Assessing the collaborative data-sharing needs of IoT devices, including those with limited computational resources, and
- Identifying the requirements of LLM agents, particularly their ability to query, interpret, and generate content based on structured IoT data.
3.2. IoT-Retrieval-Augmented Generation-Search Engine (IoT-RAG-SE)
- Embedding service descriptions into a high-dimensional vector space, and
- Conducting semantic search to match user queries with relevant IoT services.
3.2.1. Embedding Service Descriptions
- Tokenization: A pre-trained tokenizer segments the service description into predefined tokens and produces both token IDs and an attention mask for each sentence.
- Embedding: Using the generated token IDs and attention mask, an embedding model encodes the sentence into dense vector representations that capture the semantic meaning of the input text.
- Pooling: The resulting token embeddings are passed through a mean pooling operation, which averages the token vectors while factoring in the attention mask. This process yields a single embedding vector that encapsulates the overall semantic content of the service description, weighted by the relevance of each token.
- Normalization: The resulting embedding vector is then normalized to ensure consistency across all service embeddings. This step is critical for enabling accurate similarity comparisons—particularly when using cosine similarity during the semantic search process.
- Storage in Vector Database: Finally, the normalized embedding vector, along with the corresponding service name, is stored and indexed in the vector database. This enables efficient retrieval through approximate nearest-neighbor search techniques.
3.2.2. Performing Semantic Search
3.3. Generic Agentic RAG (GA-RAG)
3.4. Implementation Details
3.4.1. Classifier
3.4.2. Retriever
- IoT-RAG-SE Subnode: This component follows the architecture outlined in Figure 3 and accesses the Service and Node entities defined in the data model, Figure 2. It generates a vector database of service descriptions and returns the top-p nearest node documents that best match the user’s intent, providing relevant context to the Generator node. Additionally, IoT-RAG-SE integrates the OpenRouteService API [37] to calculate travel time and distance matrices between the user’s location and the locations referenced in the retrieved documents.
- Google Maps Subnode: This subnode interfaces with Google APIs, such as the Text Search API, and is used when a query requests a service not available within IoTRAG-SE or targets a region outside the SensorsConnect coverage area. It returns place-related documents containing the necessary details to formulate a meaningful response to the user’s query.
- Scraper Subnode: This component handles queries unrelated to IoT services and beyond the reasoning capabilities of the LLM, such as requests involving current events or news. It retrieves relevant documents by scraping online content using the Tavily API [38], supporting both the search and data extraction processes.
3.4.3. Generator
3.4.4. Reviewer
4. Scenario Analysis and Evaluation
- End users, who access the system through the user interface layer to support real-time decision-making, and
- Integrated IoT systems, which use the framework to enable inter-device collaboration.
- A user may want to find a park where barbeque is permitted and an unbooked soccer field is available, or
- A restaurant owner may need to locate wholesale traders offering the lowest prices for a monthly stock order to reduce expenses.
- Extraction of user preferences from prompts, personalized profiles, or contextual history,
- Access to structured IoT data that reflects those preferences, and
- Reasoning over complex, multi-criteria constraints.
4.1. Datasets
4.2. Loading VectorDB and Real-Time IoT Database
4.3. Evaluation
- Understanding complex queries that embed user preferences within contextual language,
- Retrieving real-time IoT data documents necessary to identify the optimal service that aligns with user intent, and
- Generating human-like responses based on the retrieved context.
4.3.1. Sematic-Search Evaluation
4.3.2. IoT-ASE vs Gemini’s Responses
4.3.3. Discussion
5. Limitations and Future Directions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Topics, E. Number of IoT Devices in 2024. 2024. Available online: https://explodingtopics.com/blog/number-of-iot-devices (accessed on 5 September 2025).
- Elgazzar, K.; Khalil, H.; Alghamdi, T.; Badr, A.; Abdelkader, G.; Elewah, A.; Buyya, R. Revisiting the internet of things: New trends, opportunities and grand challenges. Front. Internet Things 2022, 1, 1073780. [Google Scholar] [CrossRef]
- Pattar, S.; Badiger, V.; Kangralkar, Y. Context-aware IoT search engine through fuzzy clustering: Search space restructuring and query resolution mechanisms. Internet Things 2025, 30, 101494. [Google Scholar] [CrossRef]
- Hatcher, W.G.; Qian, C.; Liang, F.; Liao, W.; Blasch, E.P.; Yu, W. Secure IoT Search Engine: Survey, Challenges Issues, Case Study, and Future Research Direction. IEEE Internet Things J. 2022, 9, 16807–16823. [Google Scholar] [CrossRef]
- Tzavaras, A.; Mainas, N.; Petrakis, E.G. OpenAPI framework for the Web of Things. Internet Things 2023, 21, 100675. [Google Scholar] [CrossRef]
- Shodan. Shodan: Search Engine for the Internet of Everything. 2024. Available online: https://www.shodan.io/ (accessed on 5 September 2025).
- Mulero-Palencia, S.; Monzon Baeza, V. Detection of Vulnerabilities in Smart Buildings Using the Shodan Tool. Electronics 2023, 12, 4815. [Google Scholar] [CrossRef]
- Liang, F.; Qian, C.; Hatcher, W.G.; Yu, W. Search engine for the internet of things: Lessons from web search, vision, and opportunities. IEEE Access 2019, 7, 104673–104691. [Google Scholar] [CrossRef]
- Censys. The Censys Platform: The One Place to Understand Everything on the Internet. 2024. Available online: https://censys.com/ (accessed on 5 September 2025).
- Sentonas, M. CrowdStrike to Acquire Reposify to Reduce Risk Across the External Attack Surface and Fortify Customer Security Postures. 2022. Available online: https://www.crowdstrike.com/blog/crowdstrike-to-acquire-reposify-to-reduce-risk-acrossthe-external-attack-surface-and-fortify-customer-security-postures/ (accessed on 5 September 2025).
- Iggena, T.; Bin Ilyas, E.; Fischer, M.; Tönjes, R.; Elsaleh, T.; Rezvani, R.; Pourshahrokhi, N.; Bischof, S.; Fernbach, A.; Parreira, J.X.; et al. Iotcrawler: Challenges and solutions for searching the internet of things. Sensors 2021, 21, 1559. [Google Scholar] [CrossRef]
- Elsaleh, T.; Enshaeifar, S.; Rezvani, R.; Acton, S.T.; Janeiko, V.; Bermudez-Edo, M. IoT-Stream: A lightweight ontology for internet of things data streams and its use with data analytics and event detection services. Sensors 2020, 20, 953. [Google Scholar] [CrossRef]
- Casadei, R.; Fornari, F.; Mariani, S.; Savaglio, C. Programming IoT systems: A focused conceptual framework and survey of approaches. Internet Things 2025, 31, 101548. [Google Scholar] [CrossRef]
- Elewah, A.; Elgazzar, K. SensorsConnect Framework: World-Wide Web for Internet of Things. IEEE Access 2024, 12, 168500–168516. [Google Scholar] [CrossRef]
- Brin, S.; Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 1998, 30, 107–117. [Google Scholar] [CrossRef]
- Spatharioti, S.E.; Rothschild, D.M.; Goldstein, D.G.; Hofman, J.M. Comparing traditional and llm-based search for consumer choice: A randomized experiment. arXiv 2023, arXiv:2307.03744. [Google Scholar] [CrossRef]
- Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large language models: A survey. arXiv 2024, arXiv:2402.06196. [Google Scholar] [PubMed]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Berners-Lee, T.; Cailliau, R.; Luotonen, A.; Nielsen, H.F.; Secret, A. The world-wide web. Commun. ACM 1994, 37, 76–82. [Google Scholar] [CrossRef]
- Elewah, A.; Ibrahim, W.M.; Rafıkl, A.; Elgazzar, K. ThingsDriver: A unified interoperable driver for IoT nodes. In Proceedings of the 2022 International Wireless Communications and Mobile Computing (IWCMC), Dubrovnik, Croatia, 30 May–3 June 2022; IEEE: New York, NY, USA, 2022; pp. 877–882. [Google Scholar]
- Mostafi, S.; Alghamdi, T.; Elgazzar, K. Interconnected Traffic Forecasting Using Time Distributed Encoder-Decoder Multivariate Multi-Step LSTM. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Island, Republic of Korea, 2–5 June 2024; pp. 2503–2508. [Google Scholar] [CrossRef]
- Elgazzar, K.; Hassan, A.E.; Martin, P. Clustering wsdl documents to bootstrap the discovery of web services. In Proceedings of the 2010 IEEE international Conference on Web Services, Miami, FL, USA, 5–10 July 2010; IEEE: New York, NY, USA, 2010; pp. 147–154. [Google Scholar]
- Elgazzar, K.; Hassanein, H.S.; Martin, P. Daas: Cloud-based mobile web service discovery. Pervasive Mob. Comput. 2014, 13, 67–84. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Li, H.; Su, Y.; Cai, D.; Wang, Y.; Liu, L. A survey on retrieval-augmented text generation. arXiv 2022, arXiv:2202.01110. [Google Scholar] [CrossRef]
- Zhao, P.; Zhang, H.; Yu, Q.; Wang, Z.; Geng, Y.; Fu, F.; Yang, L.; Zhang, W.; Cui, B. Retrieval-augmented generation for ai-generated content: A survey. arXiv 2024, arXiv:2402.1947. [Google Scholar]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. React: Synergizing reasoning and acting in language models. arXiv 2022, arXiv:2210.03629. [Google Scholar]
- Venkatraman, S.; Tripto, N.I.; Lee, D. CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis. arXiv 2024, arXiv:2406.12665. [Google Scholar]
- Li, Y.; Yu, Y.; Li, H.; Chen, Z.; Khashanah, K. TradingGPT: Multi-agent system with layered memory and distinct characters for enhanced financial trading performance. arXiv 2023, arXiv:2309.03736. [Google Scholar]
- Dong, Y.; Aung, Y.L.; Chattopadhyay, S.; Zhou, J. Chatiot: Large language model-based security assistant for internet of things with retrieval-augmented generation. arXiv 2025, arXiv:2502.09896. [Google Scholar]
- Berenguer, A.; Morejón, A.; Tomás, D.; Mazón, J.N. Leveraging Large Language Models for Sensor Data Retrieval. Appl. Sci. 2024, 14, 2506. [Google Scholar] [CrossRef]
- Ouda, H.; Elewah, A.; Elgazzar, K. A Comparative Analysis of Data Models for Heterogeneous Sensor Data Management. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 27–31 May 2024; pp. 1826–1833. [Google Scholar] [CrossRef]
- Reimers, N. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Malkov, Y.A.; Yashunin, D.A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 824–836. [Google Scholar] [CrossRef] [PubMed]
- LangChain. Balance Agent Control with Agency. Available online: https://www.langchain.com/langgraph (accessed on 5 September 2025).
- OpenRouteService. Openrouteservice API Services. Available online: https://openrouteservice.org/ (accessed on 5 September 2025).
- Tavily. Connect Your LLM to the Web. Available online: https://tavily.com/ (accessed on 5 September 2025).
- Team, G.; Anil, R.; Borgeaud, S.; Wu, Y.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
- Google Developers. Places API Web Service Documentation. 2024. Available online: https://developers.google.com/maps/documentation/places/web-service (accessed on 5 September 2025).
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Groq, Inc. Groq: Fast AI Inference. 2024. Available online: https://groq.com/ (accessed on 5 September 2025).
- WeatherAPI.com. Real Time, Forecasted, Future, Marine and Historical Weather. 2024. Available online: https://www.meteomatics.com/en/weather-api/?ppc_keyword=api (accessed on 5 September 2025).
- LangChain, Inc. Get Your LLM App from Prototype to Production. 2024. Available online: https://www.langchain.com/ (accessed on 5 September 2025).
- Medimap Team. Data collected from walk-in clinics across Canada to analyze average wait times by province. In The Walk-in Clinic Wait Time Index; Technical report; Medimap: Vancouver, BC, Canada, 2024. [Google Scholar]
- Environment Climate Change Canada. Taking Stock: Reducing Food Loss and Waste in Canada. 2024. Available online: https://www.canada.ca/en/environment-climate-change/services/managing-reducing-waste/food-loss-waste/taking-stock.html (accessed on 5 September 2025).
- Halili, E.H. Apache JMeter; PACKT: Birmingham, UK, 2008. [Google Scholar]
- Dehghankar, M.; Asudeh, A. Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks. arXiv 2024, arXiv:2412.00546. [Google Scholar] [CrossRef]
# | Intent | Query | Top-P Services |
---|---|---|---|
1 | Dog Park | I want to take my dog for walking and playing catch the ball, so I can unleash it | [’dog park’, ’dog walker’, ’dog trainer’] |
2 | Shawarma Restaurant | I’m missing my home country, mmmm! I’m hungry. Oh, I want to eat Shawarma. Can you suggest any nearby place that serves Shawarma? | [’shawarma restaurant’, ’middle eastern restaurant’, ’syrian restaurant’] |
3 | Moving and Storage Service | I’m planning to move to a new place. Do you know any moving agency close to me? | [’moving and storage service’, ’car rental agency’, ’travel agency’] |
4 | Gym | I have moved here recently, and I’m looking for a gym with a good reputation. | [’gym’, ’fitness center’, ’rock climbing gym’] |
5 | Car Rental Agency | I’m travelling tomorrow, and I want to rent a car. Do you know any car rental close to me? | [’car rental agency’, ’vehicle rental agency’, ’truck rental agency’] |
6 | Sports School | My son loves hockey sport, and I want him to start with professional practice playing it. Do you know any hockey school with a good reputation? | [’sports school’, ’hockey club’, ’ice skating rink’] |
7 | Zoo | My son loves animals, so I’d like to take him to a zoo | [”zoo’, ’wildlife park’, ’animal park”] |
8 | Chinese Restaurant | I have a conference at Toronto University next week, and I want to have dinner in a Chinese restaurant during my stay there. Can you suggest one with a good reputation? | [’chinese restaurant’, ’canadian restaurant’, ’chicken wings restaurant’] |
9 | Tire Shop | Winter is coming, and I need to install my winter tires. I’m looking for a place that offers discounts on this service. | [’auto parts store’, ’car detailing service’, ’tire shop’] |
10 | Gift Shop | My daughter’s birthday is next week. Can you suggest a store where I can have a variety of options for her gift? | [’gift shop’, ’toy store’, ’souvenir store’] |
11 | Cocktail Bar | It’s too hot, and I’m so thirsty. I really want to be hydrated with fresh juice. Do you have any suggestions? | [[’cocktail bar’ ’brunch restaurant’ , ’bar’,]] |
12 | Museum | I’m interested in learning more about the local history. Which museum do you recommend visiting? | [’memorial park’, ’tourist attraction’, ’museum’] |
13 | Hair Salon | I am planning to change my hairstyle and want to visit a top-notch hair salon. Can you suggest a hair salon with a good reputation? | [’hairdresser’, ’hair salon’, ’barber shop’] |
14 | Yoga Center | I’m stressed these days. Someone told me before that Yoga could help me relieve my stress. Do you have any recommendations for a Yoga center? | [’yoga center’, ’yoga instructor’, ’yoga studio’] |
15 | Furniture Store | We are redecorating our home and need to find a reliable furniture store with quality products. Can you recommend a furniture store with a good reputation? | [’antique furniture store’, ’furniture accessories’, ’home goods store’] |
16 | Dance School | My daughter loves dancing, and we are looking for a dance school where she can enhance her skills. Can you suggest a dance school with a good reputation? | [’dance school’, ’dance company’, ’music school’] |
17 | Martial Arts School | My son is very interested in learning self-defense, and we are looking for a reputable martial arts school. Do you know any martial arts school with a good reputation? | [’martial arts school’, ’karate school’, ’taekwondo school’] |
18 | Medical Spa | I am planning to treat myself to some relaxation and care, and I am looking for a medical spa with high standards. Do you know any medical spa with a good reputation? | [’medical spa’, ’massage spa’, ’massage therapist’] |
19 | Bakery | My daughter’s birthday is coming up, and she loves unique pastries. I’m looking for a bakery that can create a custom cake that’s both delicious and visually stunning. | [’bakery’, “children’s party service”, ’donut shop’] |
20 | Indian Restaurant | My family and I will visit the local area for a cultural festival. We’d love to experience authentic Indian cuisine while we’re there. Could you recommend one nearby with a good reputation? | [’indian restaurant’, ’modern indian restaurant’, ’middle eastern restaurant’] |
21 | Dentist | My daughter recently had a toothache, and we’re looking for a reliable dentist who is experienced with kids ages Could you recommend a good dentist nearby? | [’dentist’, ’dental clinic’, ’cosmetic dentist’] |
22 | Coffee Shop | I’m planning a casual meeting with a colleague next Tuesday morning. We’re looking for a quiet place to discuss some business ideas over coffee. Can you suggest a coffee shop that’s known for its serene environment? | [’coffee shop’, ’brunch restaurant’, ’lounge’] |
23 | Optometrist | My wife has been complaining about her vision while driving at night. We think it might be time for her to see an optometrist. Can you suggest a well-respected optometrist near us? | [’eye care center’, ’optometrist’, ’optician’] |
24 | Massage Therapist | I’ve been dealing with back pain due to long hours at my desk job. I heard that a good massage can help alleviate some of the pain. Do you know of a massage therapist nearby with excellent reviews? | [’massage therapist’, ’massage spa’, ’bank’] |
25 | Golf Club | It’s been a while since I didn’t enjoy playing golf. Do you know any nearby golf clubs with affordable membership subscription. | [’golf club’, ’golf shop ’, ’golf course’] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elewah, A.; Elgazzar, K.; Elnaffar, S. Agentic Search Engine for Real-Time Internet of Things Data. Sensors 2025, 25, 5995. https://doi.org/10.3390/s25195995
Elewah A, Elgazzar K, Elnaffar S. Agentic Search Engine for Real-Time Internet of Things Data. Sensors. 2025; 25(19):5995. https://doi.org/10.3390/s25195995
Chicago/Turabian StyleElewah, Abdelrahman, Khalid Elgazzar, and Said Elnaffar. 2025. "Agentic Search Engine for Real-Time Internet of Things Data" Sensors 25, no. 19: 5995. https://doi.org/10.3390/s25195995
APA StyleElewah, A., Elgazzar, K., & Elnaffar, S. (2025). Agentic Search Engine for Real-Time Internet of Things Data. Sensors, 25(19), 5995. https://doi.org/10.3390/s25195995