Development of Raspberry Pi 4 B and 3 B+ Micro-Kubernetes Cluster and IoT System for Mosquito Research Applications

: Detecting infected female mosquitoes can be vital when they transmit harmful diseases such as dengue, malaria, and others. Infected mosquitoes can lay hundreds of eggs in breeding locations, and newborns can transmit diseases to more victims. Hence, gathering and monitoring climate data and environmental conditions for mosquito research can be valuable in preventing mosquitoes from spreading diseases. To obtain microclimate data, users such as mosquito researchers may need weather stations in various locations and an inexpensive, effective IoT system for monitoring multiple speciﬁc locations. We can achieve this in each location by sending microclimate data from wireless sensor end-node devices to a nearby middle-node aggregator. Each location’s aggregator can send the data to a cluster, such as a customized Raspberry Pi-based cluster with Micro-Kubernetes as its distributed operating system. The applications, such as the database and web server, can be wrapped up by docker containers and deployed as containerized applications on the cluster. This cluster can store the data, available to be accessed via Android and web applications. The results of this work show that the measurement data of the speciﬁc locations are more accurate than those from nearby third-party weather stations. This proposed IoT cluster system in this paper can be used to provide accurate microclimate data for the selected locations.


Introduction
Mosquito research applications have been developed and used worldwide because they are approaches for understanding and analyzing mosquitoes' behaviors. Mosquitoes feed on other organisms' blood, mainly by biting their targets with their proboscis during an organism's moments of inattention. While ingesting their target's blood, mosquitoes can potentially transfer illnesses they carry into their target's body. Therefore, the mosquito is one of the world's most dangerous creatures because it can easily infect mammals, birds, and humans with harmful diseases, such as dengue, malaria, West Nile, and Zika [1][2][3]. Moreover, due to changes in temperature, humidity, wind, and other microclimate variables, viruses and other diseases can quickly spread further and mutate during viral transmission among different targets, including infected ones [4,5]. With recent advancements in the microcomputer cluster and Internet of Things (IoT) systems, mosquito research applications that use these systems to gather and track microclimate data in different locations have become indispensable for predicting, detecting, controlling, and preventing mosquito breeding and disease spreading [6,7].
Due to the invention of microcomputers such as Raspberry Pi (RPi), mosquito researchers can use an RPi-based distributed computing cluster system instead of a simple single computing system. The single computing system refers to a cloud, typically offered as a cloud computing service by companies such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform. The RPi-based cluster refers to multiple RPis grouped over a network that functions as a single computer. Each RPi is considered a node in the cluster, and with Micro-Kubernetes (MicroK8s), load balancing can distribute tasks, which are workloads, to each node.
There are advantages to using the RPi-based MicroK8s cluster system instead of a single computing system. The cluster can be scalable and flexible because new nodes can be connected to the cluster to solve the problem of an overloaded workload without changing or stopping any ongoing applications. When one node or application is down, the cluster will provide highly available and fault-tolerant services, and the other nodes will take on the additional workload of that failed node or application. Regarding cloud computing, the third-party cloud provider may share the tenant data with business partners, law enforcement agents, or governments, especially when a law enforcement agency subpoenas them for tenant data. When using a cloud, the long-term availability and dependability are unpredictable because the cloud provider or service may not exist tomorrow. Additionally, the cloud service architecture is less flexible and less secure in terms of performance [8].
Computer clusters can focus on docker containerized applications, such as databases and web services, and reduce the workload of storing data and hosting web services by using Kubernetes (K8s) [9]. Since 2015, K8s has been designed by Google to be an opensource platform that can automate containerized applications' deployment, management, and scaling across the nodes of the cluster [10]. The RPi-based cluster can use different container orchestration tools as the distribution system for load balancing, but MicroK8s was chosen in this study. MicroK8s is a lightweight version of Kubernetes (K8s), an opensource container orchestration tool that can manage containerized applications, workloads, and services. By using the cluster in the IoT system, IoT devices can connect with the cluster so the cluster can store and view the microclimate data from different locations by mosquito researchers.
In this paper, a low-cost customized service solution is presented, which is a Raspberry Pi(s) Micro-Kubernetes cluster consisting of one RPi 4 B and three RPi 3 B+s. It uses the method of one master node and multiple worker nodes. We only used four RPis as the adequate solution because the cluster can add more RPis later, and the supply chain and chip shortage in 2021∼2022 [11]. In our scenario, we simply deploy our cluster in an IoT system based on our customized Internet of Things (IoT) network and IoT edge devices, like an aggregator. The aggregator, such as the sensor hub and gateway, allows microclimate data from end nodes to be sent across the internet to our cluster [12,13]. This IoT network technique combines older work with our cluster to form a data acquisition system to allow any individual to study, predict, and prevent the spreading of mosquito-borne diseases. The work presented in this paper is part of the first author's master's thesis work in 2022, and as of this publication, the thesis is not publicly accessible as it is in the embargo period.
This paper is organized into three parts. First, we start with the proposed method. After that, this study's measurements are shown, and the advantages of the proposed method are described. Lastly, we close our paper with a conclusion and some perspectives for our proposed works.

Iot System Architecture
In this section, the proposed IoT system is described, and it is the architecture of an efficient end-to-end process for individuals to monitor the microclimate data from any wireless sensor end-node devices. Figure 1 represents an IoT system operated by a cluster and four aggregators (remote data stations) in an IoT network.
The remote data stations (on the right side of Figure 1) are the middle node (between the cluster and the end nodes) for gathering and transferring the temperature, humidity, pressure, light, and GPS (Global Positioning System) coordinates in the mosquito breeding locations, such as standing water and containers that hold water. However, in our case, the sensors are physically connected to the remote data stations, but originally they are con-nected to the wireless sensor end-node devices, which are not included in this study's scope. The task of environmental conditions and climate data collection can be challenging due to the devices' complex setup and the harsh environment. The local microclimate data can be obtained from third-party sites, but they may not necessarily accurately represent the data at a specific location [16,17]. Therefore, our remote data stations are located near mosquito breeding sites. Each station is an aggregator for periodically collecting microclimate data from various sensor nodes in the potential mosquito breeding sites. Additionally, a single RPi board is an adequate solution for each station because it can reduce the network transmission loading relief and periodically upload the microclimate data to our cluster. From others' research, individuals can also use an RPi in a fully controlled street lighting isle and human-powered vehicle [18,19].
Due to the efficiency of the IoT system, the IoT system based on cloud services has been used for structural health monitoring to increase human safety and reduce maintenance costs of the system [20]. Additionally, RPi has been used as the IoT edge device to gather climate data and upload it to the cloud services [21]. Moreover, in 2016, Keijo Heljanko designed and developed a Kubernetes cluster consisting of five RPis to act as the IoT sensor node to send pictures and temperature values to a cloud platform that used the Apache Kafka framework [22]. However, comparing our IoT system and cluster to the work done by others, our IoT system can increase the number of aggregators for monitoring microclimate variables on more mosquito breeding sites. Additionally, because of the aggregator's abilities, we can reduce latency and bandwidth costs when data are transitioning from the wireless sensor end-node devices to the cluster.

MicroK8s Cluster Architecture
In Figure 2, the architecture of the Micro-Kubernetes cluster is shown. Micro-Kubernetes (MicroK8s), a 1.23 stable version, is a lightweight and fully compliant Kubernetes (K8s) distribution system for our cluster. It can provide and simplify the usage of K8s for each RPi in our cluster. Each RPi has installed Ubuntu 18.04 LTS (Long Term Support) as the operating system because Canonical supports and features its version of MicroK8s [23]. The primary reasons for using MicroK8s are the redundancy, scalability, and reliability factors. The monotony of MicroK8s provides uninterrupted services, such as database and web service, to the users. By using MicroK8s, users can add more services by adding more RPis (worker nodes) to the cluster. With the redundancy and scalability of MicroK8s, users can depend on the cluster to collect microclimate data from each location's remote data station.
The cluster consists of one master node and three worker nodes. Each node has components that will be described. The master node (on the left) controls and manages the worker nodes (on the right) and distributes the workloads, such as hosting web services, to them. It is also responsible for orchestrating containers containing the web application and other applications on the worker nodes. The master node has a few more add-ons, including a Kube API (Application Programming Interface) server, controller, scheduler, and etcd. Figure 2. The MicroK8s architecture for the distributed system in this study [24,25].
With the use of these add-ons, users can automate the management of the cluster. The Kube API server, the front end of the cluster, allows the users, management devices, and all external communication to interact with the cluster. The controller is the orchestration's brain, which makes decisions to bring up new containers when nodes, containers, or other endpoints go down. The scheduler is responsible for distributing the workloads or containers to different nodes, including the master node. The etcd is the key value store used by MicroK8s to store all data used in the cluster in a distributed manner.
Besides the significant components from the master node, all nodes have pod(s) and a Docker daemon. A pod is a group of one or more containers, such as Docker containers, and also a single instance of an application, such as a database. Nodes can increase the number of pods across nodes to share the workloads when more users are accessing the application. The Docker daemon manages the objects, such as the Docker image, container, and network. Web and other applications can be built and run in the Docker container.
Besides the master node, there are three worker nodes. Each of them contains the Kubelet, Kube proxy, container runtime, and optional add-on(s). The Kubelet is the node agent of the worker node. It can register the worker with the Kube API server to allow itself to communicate with the master node. The Kube proxy is used as the network proxy, which maintains the network rules on the worker node. The container runtime is used to manage and support the worker node continuously in a life cycle. If the worker nodes have extra memory usage, they can add optional add-ons, such as CoreDNS, Calico, and Flannel, which extend the functionality of MicroK8s. Figure 3 shows the front and top view of the RPi MicroK8s cluster. The enclosure is made of metal and can house four RPi boards.
Each RPi is firmly installed on the enclosure. The RPi on the left side of Figure 3 is an RPi 4 B board. The rest of them are RPi 3 B+ boards. Each RPi is connected to an Ethernet switch, which provides internet access and communication between the RPi boards. The RPi 4 B board has an external USB flash drive to store the database file. All of the database data and configuration files are stored in the USB flash drive.
The dimensions of our cluster are 20.6 × 13.6 × 16.2 cm. The back view of the cluster shows the case fans and the port that allows power cables to connect to the RPis and the Ethernet switch. Two case fans are used, and a device fan is installed on the RPi 4 B board.  Figure 4 shows a remote data station (aggregator). In our scenario, we used four of these and placed them in four different mosquito breeding locations. It has a 3Dprinted enclosure to protect the device. Additionally, it is an embedded system with a Raspberry Pi 3 B board and sensors. The sensors are supposed to be connected to the wireless sensor end-node devices, which are not included in this study's scope. If the end-nodes are included, they can transmit microclimate data to the aggregator. In mosquito research, the microclimate data and GPS coordinates are some of the important factors for determining the method of mosquito population control. Therefore, this remote data station can currently measure temperature, humidity, pressure, and light, as well as GPS coordinates. Moreover, this system has user buttons and an RGB (Red, Green, and Blue) color indication LED (Light-Emitting Diode). Users can transfer the measured microclimate data from these remote data stations to the cluster using the MQTT (MQ Telemetry Transport) protocol.

MicroK8s Cluster and IoT System Implementation
Our cluster is formed by connecting four RPis to a network switch. Users can place it in a room with an internet connection and good airflow conditions for the operations of fans. The benefit is that users can position the cluster anywhere with those conditions. In our case, the cluster was installed and placed in College Station, TX, USA. It allows users to access the microclimate data via our customized Android and web applications. Those data are available on the database when they are sent from the remote data stations.
The flowchart in Figure 5 shows the sub-process of the measurement, which can repeatedly be running in the remote data station. Each remote data station can collect data from various sensors at about 2∼3 s. Next, it can publish the data to the MQTT (MQ Telemetry Transport) broker on the cluster. Because each station has a unique device ID (identification) number with an MQTT topic equivalent to that ID, the MQTT broker can filter each message for each connected station based on the ID. For example, if the remote data station's ID is 101, then the topic is 101. After that, the database can store the data from the MQTT broker, as shown in Figure 6, but the exact GPS locations are hidden in the figure. The MySQL database, version 8.0, is used in our scenario because the stability of the database is an essential factor in the IoT system. The database needs to store a large amount of data coming from the remote data stations, so when MySQL compares to MongoDB and other databases, MySQL is more stable for storing the data [26,27].  The flowchart in Figure 7 shows the data access process using an Android application or web browser. Using an API (Application Programming Interface) URL (Uniform Resource Locator) from the Flash API, the Android application or web browser can use an HTTP (Hypertext Transfer Protocol) GET request to receive a response from the cluster. Once the Flask API receives the request, the program can fetch microclimate data from the MySQL database. The microclimate data can be formatted into JSON (JavaScript Object Notation) form and transferred to the Android application or web browser. Once the Android application or web browser receives the data, it will decode the JSON and display each parameter's value.    Figure 9 shows screenshots of an Android application developed using Flutter and Android Studio. We can access the same data as the website with a few selections. The first screenshot on the left shows the welcome page. After researchers select the begin button, the second page will show up, and the locations of the remote data stations can be displayed on the map. This map is obtained using Google Map API. Next, users can select icons of remote stations. After that, a message window will pop up. The users can choose the message window, and the application will direct them to the next page, where the list of the available microclimate data is displayed on the screen. Lastly, the users can select the parameter and display the corresponding real-time data graph. For example, selecting the humidity parameter can display its real-time data graph. The users can return to the previous page by selecting the top left return icon.

Microclimate Data Measurements
The microclimate measurements are collected from different remote data stations located in different locations. As shown in Figure 10, the measurements' date is 28 August 2022, and it shows the time duration of 12 h, from 9:30 a.m. to 9:30 p.m. Two sets of data from remote data stations are selected and presented as an example for comparison. Therefore, the set of measurements of (a), (b), and (c) shows the data from the remote data station located in College Station, Texas, USA. Parts of the measurement are missing because the remote station was out of battery at that time. The other set of data in (a), (b), and (c) in Figure 10 is the publicly available weather data collected from Weather Underground (WU). The WU data points that are closest to the test location were selected. The distance between the remote data station and the selected WU station is separated by about 1.6 miles. The light information is not available from Weather Underground, but it was measured using remote data stations as shown in (c) and (f) in Figure 10. The set of measurements of (d), (e), and (f) shows the data from the Remote Data Station (RDS) located in Houston, Texas, USA. In this case, the distance between the RDS and the selected WU station is separated by about 15.8 miles. Data collected using the remote data station are drawn in blue color, and the data from Weather Underground are illustrated in an orange color [28,29].
Users can compare the ambient temperature and humidity data between the ones from the Remote Data Station (RDS) and the ones from Weather Underground (WU). The percentage error between the measurements from the RDS and WU is shown in Table 1. Additionally, it was calculated by using the formula below. The observed value is the data from RDS. The WU value is the data from Weather Underground. % Error = 100 * |Observed Value − WU Value| ÷ WU Value In Table 1, excluding the table's subtitle, the first and second rows show the percentage error for the corresponding chart in row (a) of Figure 10. The third and fourth row show the percentage error for the corresponding graph in row (b) of Figure 10. In College Station, the RDS and WU station are close to each other, so the various percentage errors for ambient temperature and humidity are smaller than the ones from Houston. However, in Houston, the RDS is far from the WU station, so the percentage error is higher than those from College Station. The average percentage error for College Station's temperature is 5.32%, which is equivalent to the minimum percentage error for Houston's temperature. Additionally, in temperature, the error caused by the device can be considered as ±2.50 • C. This error factor would not be regarded as a significant contributing factor that shows the difference between the data from the RDS and WU. These results indicate that this proposed microclimate IoT system may provide more accurate data for the test spots for selected potential breeding locations. Additionally, more accurate and precise data collection significantly changes research quality in that area.
Instead of showing microclimate data for 12 h, Table 2 shows the extended period of the daily record for two of the recorded variables, ambient temperature and humidity. These data were collected from another station, 102, in College Station. Therefore, the table is divided into two big columns, College Station's station 102 and WU. Under the College Station column, the average and standard deviation of ambient temperature and humidity are shown from 7 September to 14 September 2022. Because WU does not provide the standard deviation, only the average temperature and humidity are shown under the second big column, WU. The distance between station 102 and the selected data point from WU is separated by about 1.14 miles.
On September 7th, the average temperature for station 102 in College Station was 27.57 • C, and the average temperature for the nearby Weather Underground station was 26.94 • C. The percentage error between those two data points is 2.34%, close to the minimum percentage error, 2.70%, for College Station's ambient temperature in Table 1. Therefore, mosquito researchers could use these accurate microclimate data for the area of interest to generate more accurate analysis. In the following subsection, additional ping measurements for the cluster are described for more information.

Ping Measurements between the Master Node and the Worker Nodes in the Cluster
This section started by comparing ping time between the master node and worker nodes. It is under the scenario of four remote data stations simultaneously sending microclimate data to their connected cluster. In this measurement, a 32 bytes data packet is used to measure the ping time in millisecond (ms) units. Table 3 shows the ping time between the master and worker nodes for the cluster. The table shows the measurements' the minimum time, average time, maximum time, and standard deviation.  Table 3 is divided into three big columns. The first column is the master node of the cluster from Figure 2. The second column is the worker nodes, rpi3b-plus-worker-1, rpi3b-plus-worker-2, and rpi3b-plus-worker-3, from Figure 2. The last column is the ping time, which is divided into four sub-columns, minimum time, average time, maximum time, and standard deviation. The average time between the RPi 4 B master node 1 and the RPi 3 B+ worker node 1 is 0.759 ms. The standard deviation between them is 0.067 ms. By using the average time and standard deviation, the standard deviation graph can be made for a graphical representation of the data.

Conclusions
In this paper, we proposed a microclimate IoT system for mosquito research applications, and it may collect and monitor accurate microclimate data from selected locations. Mosquito researchers may use the microclimate data for the test spots of interest for analysis and control. Various methods of accessing the data in this proposed IoT system are offered in this IoT system. The data can be accessed using an Android application and website for the data stored in the Raspberry Pi Micro-Kubernetes cluster. The measurements were shown, and the data were compared with the publicly available weather data. The result shows that this customized IoT network, remote data stations, and low-cost RPi MicroK8s cluster may provide more accurate data for the test spots, and the data can be accessed via an android application and website.
For future work, we will continue to develop various RPi MicroK8s clusters. We plan to perform investigations using the microclimate data from this IoT system. We plan to check if there is a substantial limitation on the accuracy and precision of micro-climate measurements. Additionally, we plan to carry out more granular monitoring for microclimates to check the difference between the micro-climate and macro-climate in mosquito breeding sites. There can be multiple opportunities for future work that can build upon the proposed cluster in this study. More different types of measurements at multiple locations can be performed and processed using the proposed clusters. We plan to improve the cluster by using a newer version of RPi for all nodes in the cluster. We plan to perform more experimentation by increasing the scale of worker nodes by adding several more Raspberry Pis to the cluster. In the remote data station of the IoT system architecture, we see a potential vision of making the work easier for the maintenance engineer by making wireless sensor end-node devices need less maintenance and minimizing their energy use.  Data Availability Statement: Data for a few specific days is contained within the article. The data for other days are not publicly available due to the funding source's ownership of the data.

RGB
Red, Green, and Blue RPi Raspberry Pi Std. Dev.
Standard Deviation URL Uniform Resource Locator WU Weather Underground