LPAI—A Complete AIoT Framework Based on LPWAN Applicable to Acoustic Scene Classification Scenarios

Deploying artificial intelligence on edge nodes of Low-Power Wide Area Networks can significantly reduce network transmission volumes, event response latency, and overall network power consumption. However, the edge nodes in LPWAN bear limited computing power and storage space, and researchers have found it challenging to improve the recognition capability of the nodes using sensor data from the environment. In particular, the domain-shift problem in LPWAN is challenging to overcome. In this paper, a complete AIoT system framework referred to as LPAI is presented. It is the first generic framework for implementing AIoT technology based on LPWAN applicable to acoustic scene classification scenarios. LPAI overcomes the domain-shift problem, which enables resource-constrained edge nodes to continuously improve their performance using real data to become more adaptive to the environment. For efficient use of limited resources, the edge nodes independently select representative data and transmit it back to the cloud. Moreover, the model is iteratively retrained on the cloud using the few-shot uploaded data. Finally, the feasibility of LPAI is analyzed, and simulation experiments on the public ASC dataset provide validation that our proposed framework can improve the recognition accuracy by as little as 5% using 85 actual sensor data points.


Introduction
Low-Power Wide Area Network (LPWAN) is a wireless network that is an essential part of wireless communication for the Internet of Things (IoT) [1] and has received extensive attention in recent years [2]. LPWAN is designed to allow low-power and long-range transmission, which makes LPWAN have the following features. First, it has strict transmission rate and payload length limitations. Second, the edge nodes in LPWAN are required to operate for long periods without battery replacement, and are usually ultra-low-power embedded devices with about a few megabytes of memory resources, thus they have extremely limited computing resources compared to mainstream lightweight networks with millions of parameters [3]. Moreover, low-power microcontrollers use MHz-level frequencies, while mobile devices employ GHz-level frequencies, so their computational resources are also scarce. Among all LPWAN technologies, LoRa is the most widely used [4]. Unlike other solutions, LoRa is deployed in unlicensed bands so that users can build networks independently. As a result, anyone can have complete ownership and control of the LoRa network at a low operational cost.
LPWAN has been used in medical treatment [5,6] and traffic monitoring [7], especially in acoustic scene classification (ASC) scenarios. ASC refers to recognizing different indoor and outdoor scenarios via acoustic signals recorded by sensors. Many studies utilize AIoT technologies to optimize the performance of ASC tasks [8]. Zualkernan et al. [9] have designed a system for monitoring bat species based on echolocation audio. For scenarios such as wildlife monitoring, which requires the long-term deployment and challenging maintenance, LPWAN is the most effective network option. Implementing AIoT technology in these acoustic scenarios allows offloading computational tasks to nodes located at the edge of the network, thus effectively reducing the amount of data transmission in the medium and extending the lifetime of the edge nodes, which aligns with the needs of LPWAN. Subsequently, this brings several benefits [10,11], including ultra-low latency [12], reduced power consumption [13,14] and improved network reliability and data security [14,15].
However, due to the transmission characteristics of LPWAN, there remain challenges to the implementation of AIoT technologies in LPWAN. In particular, it is difficult for the cloud to utilize data from the sensors for model updates after deploying intelligent models, resulting in still poor recognition performance of edge nodes. This occurs due to several reasons.

•
The domain shift problem exists between the environment and the data available in the cloud. Domain shift is caused by sensor type changes and deployment locations, resulting in differences between source and target domains, thus weakening the intelligence capabilities of the edge nodes. • It is costly for LPWAN to upload a single packet of raw data back to the server. For example, the LoRa payload comprises only about a hundred levels of bytes. Capturing a 1-second voice recording at 16Khz can produce 64K bytes of raw data. Therefore, hundreds of LoRa packets would need to be sent to deliver them all to the cloud, which creates a huge overhead. • From a data volume perspective, existing mobile-network-based frameworks support the backhaul of all target domain data, which is almost impossible for LPWAN. Moreover, even if only a small amount of data is uploaded, it is ineffective to retrain based on these few-shot samples using traditional methods.
There has been much research on AIoT frameworks in recent years. Huawei Technologies Co. et al. [16][17][18] proposed mobile device frameworks designed for high-dimensional data such as images, which are not suitable for the more restrictive LPWAN. Other frameworks, including Chiu et al. [19], Zualkernan et al. [9] and Chang et al. [20] are based on LPWAN; however, these studies do not consider updating the intelligence model according to the environment. They have still not implemented edge intelligence for LPWAN in this sense.
To increase the suitaility of LPWAN edge nodes for the environment, we propose LPAI, the first generic framework for implementing AIoT technology based on LPWAN applicable to ASC scenarios. LPAI not only realizes the intelligence of edge nodes in LPWAN, but more importantly, it allows resource-constrained edge nodes to continuously improve their performance using raw data, thus becoming more adaptive to the environment. The main contributions can be summarized as follows: • LPAI enables the intelligence of edge nodes and reduces bias between the source and the target data. This enables edge nodes to be applied in real-world scenarios and continuously utilize unlabeled target data from the environment to improve their recognition performance. • A data screening mechanism suitable for LPWAN is designed to improve the recognition performance of edge nodes. In this mechanism, edge nodes make independent decisions and only upload compressed features that help improve performance. Furthermore, iterative retraining is performed in the cloud based on few-shot compressed features returned by the nodes. • We evaluate multiple datasets in ASC scenarios, proving that LPAI can become a general AIoT framework of LPWAN applicable to the ASC scenarios.

Domain Adaptation
In computer vision, domain shift problems are solved using domain adaptation (DA) techniques. With DA techniques, the bias caused by the target data collected by actual sensors can be better mitigated, forcing the data distributions of the two domains to be similar [21][22][23][24][25][26][27][28][29]. Researchers have gradually veered toward wearables [30] and mobile devices [31,32], but rarely toward the more restrictive LPWAN.

AIoT Frameworks
We summarize the research on AIoT frameworks for mobile devices and LPWAN, as shown in Table 1. Huawei Technologies Co. [16] proposed an open-source deep learning training and inference framework called MindSpore. E. Raj et al. [17] implemented a deep learning model for human pose estimation and tracking based on MindSpore in computer vision. Rong et al. [18] designed a collaborative computing platform between edge devices and the cloud to support continuous model evolution and system updates. The foregoing studies are based on mobile networks. They support the uploading of all target domain data regardless of resources. Chiu et al. [19] propose an AIoT precision feeding management system based on LoRa network to improve the existing automatic feeding system in the mark. Chang et al. [20] proposed an intelligent assistive system based on wearable smart glasses and smart cane. However, LPWAN is only responsible for transmitting GPS information of where a fallen person is located. The wearable smart glasses implement the intelligence task.  [16] An open-source deep learning training and inference framework × × [17] A deep learning model for human pose estimation and tracking × × [18] A collaborative computing platform between edge devices and the cloud × [9] A system to monitor bat species based on echolocation audio.
× × [19] An AIoT precision feeding management system to measure water surface fluctuations in areas of fish pellet application × × [20] An intelligent-assistance system for visually impaired people to achieve the goals of aerial obstacle avoidance and fall detection.

Overview
LPAI is a general AIoT framework based on LPWAN for acoustic scene classification scenarios. Like the traditional AIoT framework, LPAI contains four layers from top to bottom: the application, platform, transport, and perception. The overall framework diagram of LPAI is in Figure 1. The application layer contains urban acoustic scenarios. Acoustic sensors are deployed at the entrance of construction sites to monitor incoming and outgoing traffic flow during unconventional hours, effectively discouraging illegal site operations. In this scenario, the input is the wave data collected by the edge nodes through acoustic sensors. After model inference, the output is the scene which the acoustic data belong to, such as traffic or pedestrian. Subsequently, the edge node determines whether the current data need to be uploaded by our proposed data screening algorithm. When a certain amount of data from the target domain is collected in the cloud, it is retrained with our cloud-based retraining algorithm. A new model that is more applicable to the environment is eventually obtained, thus continuously improving the recognition performance of nodes in real-world applications.
In LPAI, the platform layer comprises the cloud platform and operating system that handles complex intelligent tasks. The transport layer is mainly responsible for uplink and downlink data transmission. Furthermore, the perceptron layer is responsible not only for sensing, execution, and control, but also for executing small AI computational tasks and making independent decisions.  Figure 1. The overall framework of LPAI. It is divided into four layers, namely the application layer, platform layer, transport layer and perception layer.

The Transport Layer
The transport layer is first introduced to describe the data flow of the framework. The transport layer is based on the standard LPWAN protocol to ensure generality of the LPAI. The transmission task is divided into uplink and downlink, as summarized below.

•
The uplink uploads from the edge side to the cloud. To save resources, it mainly uploads compressed features of raw data from the environment and the statistical events of edge nodes. • The downlink includes the model to be updated and the spatial distribution of the source domain, providing the basis for nodes to filter the data independently.
In LPAI, the complete data flow process after implementing edge intelligence is as follows. An edge node periodically uploads the statistical information to the cloud to notify when a vehicle is detected. At the same time, it only uploads the compressed features that are more helpful for model updates. After a certain number of compressed features are uploaded, a new model is generated in the cloud. The newly produced model will then be sent down to the edge node via the downlink of the LPWAN protocol. Thus, a closed-loop complete framework is constructed. The complete data flow process is shown in Figure 2.  In LPAI, the transport layer is mainly responsible for uplink and downlink data transmission. The arrows in the figure represent the data flow. The solid black line represents the initialization model that needs to be deployed locally on the node before the system operates, and the dashed line represents the data transmitted through the LPWAN protocol after deployment. Among them, the compressed features processed by the edge node and the response events are uploaded in the uplink, and the model to be updated is transmitted in the downlink.

The Platform Layer
In LPAI, the platform layer provides complex computing tasks in the cloud. Its tasks are listed below, and are also shown in Figure   Tasks on the cloud. The task is divided into two steps, the first step is to train an initial model S init to implement edge intelligence. The second step is to retrain a teacher model with the target domain features, and then distilling a student model. Finally, the student model is redistributed to the nodes.

Training an Initial Model
Initially, a portion of the labeled source features is on the cloud. This part of the source features needs supervised training to get an initial model M init . Unlike traditional deep learning, this model performs simple classification tasks, usually two- [35], three-or four-class problems. In general, the more complex the classification task, the higher the training cost. Subsequently, M init can then be compressed and prepared for deployment on resource-constrained devices.
According to Hinton's knowledge distillation method [36], a deep and complex network can transfer to a small and shallow network, also known as the teacher-student model. Inspired by the recent model distillation technique, we design a soft target loss in order to transfer novel knowledge. Considering an example x t i ∈ D, the teacher network can produce class probabilities by a softmax layer that converts its logits z i = {z 1 i , . . . , z K i } to an output of probability y i . The temperature T is put into the standard softmax function as shown in Equation (1), where T is a parameter denoting temperature, set to 1 in a standard softmax. It converts the logit values to pseudo-probabilites. The teacher model's knowledge is transferred to the student model through Equation (4), using cross-entropy as the loss function, where L so f t in Equation (2) refers to the cross entropy between the soft labels of the teacher p T j and the student q T j when the temperature is T. The higher the temperature, the greater the attention paid to negative labels. Since the teacher's judgment is not entirely correct, L hard in Equation (3) is introduced to prevent overfitting. It represents cross-entropy between the student's soft and hard labels at temperature T = 1. When the teacher's judgment is incorrect, the student learns by referring to the correct label. Finally, the student model S init is distributed to the nodes.

Cloud-Based Retraining Algorithm
Because the number of newly uploaded target features is much smaller than the existing source domain, a feature-specific augmentation method is required to prevent poor generalization performance. The classical methods used for the augmentation of acoustic are usually specific to the raw data [37,38], not the features. Generative Adversarial Networks (GAN) are a popular research area in computer vision. GAN can expand target features by creating new features using its generator. Moreover, they can also achieve excellent performance in the case of few-shot learning. The structure of GAN is displayed in Figure 4. In the retraining phase, the training and test sets are the source domain f s i and the expanded target features f new_t i .
A pseudo-label is added to each unlabeled target feature in each training epoch. Because these pseudo-labels are still unreliable, features with confidence greater than γ server are selected iteratively and fed into the neural network. As the epoch increases, the model performance will improve, and the number of target features involved in training will increase, further enhancing the overall recognition performance. In a previous work, many loss functions were designed to decrease the distance between the source and target domains, such as MMD [39]. In LPAI, a simple coral loss [40] is used based on the two-norm distance of the covariance matrix between the two domains. As in Equation (5), d represents the batch size of each round of training. The source and target domain covariance matrices calculate as Equations (6) and (7), where 1 represents a column vector. This part is shown in Algorithm 1.
Algorithm 1 Cloud-Based Retraining Algorithm 1: Given a classification task with C categories. Input: Source features f s i and some uploaded features f t i . 2: Perform data augmentation on f t i and obtain f new_t i . Use a portion of f s i and f new_t i as the training set X retrain to train a teacher model. 3: for each epoch do 3: Obtain samples x t with a confidence level higher than γ server .  The teacher model is transferred to a new student model M t after retraining. The identical model compression method mentioned in Equation (4) is applied here. Beyer et al. [41] showed that knowledge distillation is more effective when the inputs to the teacher and student models are consistent, so X retrain is still used as the input to the student for distillation. The complete compression process is in Algorithm 2.

Algorithm 2 Compression Algorithm
Input: Use X retrain as the training set and a teacher model M s . 1: for each epoch do 1: Calculate the soft loss L so f t . 1: Calculate the hard loss L hard . 1: Calculate the total loss in Equation (4). 2: end for 3: Output: A new student model M t .

Data Distribution
The cloud requires computation of source domain features prior to being sent to edge nodes in order to provide a basis for them to determine whether to submit the data.
However, storing all the source domain features is far beyond the memory capacity of an edge node in LPWAN. The cloud only calculates the normalized average vector d s i of each category in the source domain to conserve memory overhead. As for how the nodes use d s i , this will be continued in Section 3.4.3.

The Perception Layer
Unlike traditional AIoT, the perception layer in LPAI handles more additional tasks. Therefore, edge nodes are supposed to schedule the tasks rationally within the limited resources. Figure 5 shows the task flow diagram of the edge node. The tasks are divided into three main parts: • The edge node collects real-world data and performs inference. • During each inference cycle, the edge node determines whether the current features after feature extraction needs to be sent back to the server. • The edge node receives a new model and loads it into memory.  Figure 5. Node State Diagram. The leftmost boxes show regular tasks of the node, i.e., constantly collecting data for inference. The middle section displays the intelligent decision-making function specific to LPAI. Furthermore, the right section displays the data interaction process with the upper layer network.

Edge Intelligence
In LPAI, a neural network model is trained in the cloud using the Keras framework and then converted into hexadecimal data suitable for embedded devices with the help of Tensorflow Lite. This tool is used for deploying deep learning models on mobile and embedded devices, consisting of a converter and an interpreter. It also accomplishes network optimizations, such as quantization during the conversion process. The interpreter is responsible for transforming a .tflite file into a format that can be deployed on mobile devices and embedded microcontrollers. The hexadecimal model is then loaded into the device with the help of an open-source inference framework provided by Edge impulse to implement model inference.

Feature Extraction
To significantly reduce the amount of data returned, important information needs to be extracted from the raw data. The traditional spectrogram method divides the window into multiple overlapping frames and then computes the FFT for each frame [42]. The size and number of frames can be adjusted with the parameters Frame length and Frame stride. For example, with a window of 1 s, frame length of 0.02 s and stride of 0.01 s, it will create 99 time frames. An FFT is then calculated for each frame. Finally, the noise floor value is applied to the power spectrum. In edge impulse, to adapt to non-voice data, two additional steps are added. After computing the spectrogram, a triangular filter is applied on the Mel scale to extract frequency bands. The main idea is to extract more features in low and less in high frequencies. The final step is to perform local mean normalization of the signal, applying the noise floor value to the power spectrum.

Data Screening Algorithm
The edge nodes of LPAI can screen representative features autonomously. As mentioned above, the average vector d s i is pre-stored in the node as a baseline for the source domain data. Y. Kim et al. [43] mentioned that the pseudo-labels are more accurate for data closer to the source domain, while the opposite is true for data farther away from the source domain, as shown in Figure 6. The algorithm of Yang et al. and Y. Kim et al. [31,43] cannot be directly applied to LPWAN because of the limitations of embedded devices. Hence, a more ingenious implementation is designed.
In each round of inference, the edge node collects raw acoustic data through sensors, acquires f t j after feature extraction, and then infers the classification result by forward propagation. The node decides whether to upload the feature if the confidence level is higher than γ nodes . γ nodes can be interpreted as the probability that the node has γ nodes to trust its judgment. It should be set lower than γ server because the features that confused the nodes should be selected and retrained.
Real world data Street unlabeled airport target unlabeled street target Airport Figure 6. Taking binary classification as an example, the labeled source domain data can be classified by the class boundary, but in the target domain, the data closer to the class boundary has a higher probability of being correctly classified, while the data farther away from the class boundary is difficult to distinguish. Assigning pseudo-labels to all target data generates numerous errors.
Next, the distance d f j between f t j and each d s i is calculated. However, an edge node cannot record each f t j and sort them based on the distance as Y. Kim et al. did. Even storing the reduced-dimensional features is still impossible for low-power devices. Therefore, a more straightforward approach is taken to find the target features that confuse the nodes. It is to design a temporary variable d temp , which is the average of d f j . It represents the average distance of all the target domain data from the source domain. Whenever a edge node obtains a f t j , it caculates the distance d f j and update d temp . To ensure the correctness of the pseudo-labeling while electing features confused the nodes, the final features to be uploaded are those near d temp , which is to satisfy Equation (9). Because the number of newly data is as few as possible, η is as close to 0 as possible. The judgment process is shown in Algorithm 3. Its computational complexity is O(n), where n is the total number of data points in the target domain.
Algorithm 3 Data Screening Algorithm On the Node 1: Given d s i from the server, i = 1 . . . C, where C is the number of categories. Input: A new target data: 2: Obtain f t j after feature extraction. 3: Perform inference. 4: if Its confidence level is greater than γ nodes then 5: for each f t j do 5: Calculate the distance between each f t j and the d s i of the corresponding pseudolabel category.

5:
Update the average distance d temp once. 6: if Equation (9) is satisfied then 7: Output: upload f t j .

Evaluations and Results
In this section, we conduct extensive experiments to demonstrate the ability of LPAI to enable edge nodes to enhance their capabilities in an environment based on few-shot data. We validate it with three acoustic scene classification datasets using LoRa, a typical network in LPWAN. Experiments are based on a four-classification scenario. We have deployed LoRa nodes at a school intersection to monitor the operation of shuttles. Its hardware diagram is shown in Figure 7. The evaluation is divided into three parts: the first part evaluates the effectiveness of the cloud-based retraining phase. It is assumed that all target domain features have been uploaded, so the model is trained with the source domain data and a portion of the target domain data and also tested on the remaining target domain data. Moreover, the second part evaluates the effectiveness of Algorithm 3 for screening the dataset. The results of uploading all target data and the random screening method are used as benchmarks. Suppose the model is retrained after collecting 85 points of data to mimic the operation pattern of nodes in the environment. In the last section, we combine experimental and theoretical analysis to explore the impact on LoRa network performance.

Datasets
• TAU Urban Acoustic Scenes 2020 Mobile, development dataset [44]. It contains recordings from 12 European cities in 10 different acoustic scenes using 4 different devices. Because only four-classification questions are discussed, four scenarios commonly used in LPWAN are chosen: metro, metro station, street pedestrian and traffic. However, changes in the environment can lead to differences between domains, and the evaluation is divided into five domains according to the collection location, namely A1 (Barcelona, Helsinki), B1 (Lisbon, London), C1 (Milan, Lyon), D1 (Paris, Prague), E1 (Stock, Vienna). This dataset is divided into several source-domain and target-domain datasets according to different locations. The baseline system for the 2020 DCASE Challenge provides an accuracy of 54.1%. • Urban sound 8K [45]. The dataset contains 8732 labeled sound excerpts from 10 categories of urban sounds. Take four of them: marked as A2. Refer to paper [45] for baseline results. • Dataset AOB [46]. This dataset uses a convolutional neural network to collect and manually edit an audio dataset for urban sound event classification for a master's thesis. Take four of them: marked as A3.
After experiments, we discovered that an edge node could recognize a dataset up to 2 s in length in our scheme. Considering power consumption, recognition performance and other factors, the audio length was determined to be 1 s. Therefore, the above three datasets were downsampled to 16 kHz and cut into 1-second fragments before training. After feature extraction, the original dataset with a length of 1 s and a size of 64K bytes were compressed into feature data with 4K bytes. This means that the quantity of data transmitted over the network is compressed by a factor of 16.

Evaluating Cloud-Based Retraining Algorithms
In this section, we evaluate the effectiveness of the retraining phase. We first train an initial teacher model with the source domain data to evaluate its performance on the target domain, providing a baseline result. The structure of the teacher model is shown in Figure 8. We use 80% of the source domain as the training set and test the remaining 20%. The reason for designing this experiment is to demonstrate that the domain-shift problem exists and to provide a baseline for subsequent experiments to prove the validity of our approach. This structure comprises multiple 2D convolutional layers, pooling layers, and dropout layers. The convolutional layer represents the core part of recognition and is used to extract different features from the data. The pooling layer is used to reduce information redundancy and model computation, reduce the difficulty of network optimization, and prevent network overfitting.
Next, we evaluate the effectiveness of Algorithm 1. We assume that all unlabeled target domain data are uploaded to the cloud. Unlike the traditional approach, we use 80% of the unlabeled data as the training set and 20% as the test set. The parameters are set as follows, γ nodes is 0.5, and γ server is 0.7. Adam is employed as the optimizer. Its learning rate is 0.0001, β 1 is 0.9, β 2 is 0.99, and the iteration epoch is 20.
Subsequently, we evaluate the effectiveness of the cloud-based retraining phase by assuming all target domain feature data are uploaded to the cloud. The reason for this design is firstly to evaluate that our method can reduce the bias between different domains, and secondly to provide a control for subsequent evaluation of the data screening algorithm. Therefore, the inputs to this part of the algorithm are all labeled source domain features and all unlabeled target domain features. Unlike the traditional approach, we use 80% of the unlabeled data as the training set and 20% as the test set. The parameters are set as follows: γ nodes is 0.5, and γ server is 0.7. Adam is used as the optimizer with a learning rate of 0.0001, β 1 is 0.9, β 2 is 0.99, and the iteration period is 20. The output of the algorithm is a teacher model with optimal performance. Finally, the performance after compression was evaluated. In this part, the input of the student model is the same as in the previous step, i.e., the input of the teacher model, and after iterative training, an optimal student model is obtained. In addition, the student model is finally updated to the edge node. The structure of the student model is shown in Figure 9. The results of the first part of the experiment are presented in Table 2. All results are measured according to two criteria: accuracy and Kappa coefficient. Accuracy is the most common metric for classification problems and is used to measure the model's accuracy on the test set. The Kappa coefficient is commonly used for multi-classification issues and is employed to measure consistency. Consistency refers to whether the model predicts the same results as the actual classification results. It usually lies between 0 and 1, and a larger Kappa coefficient indicates greater consistency. The experiments were conducted on three data sets, and group A1-B1 means that A1 is the labeled source domain and B1 is the unlabeled target domain data. The first two columns record the model performance trained with the source domain data on the source test set and the target domain. The performance of the source model on different target domains is recorded to facilitate comparison with the results of subsequent experiments. The last three columns show the performance after our cloud-based retraining algorithm. The performance of the model in the source domain, the target domain, and the student model in the target domain after retraining is recorded, respectively. The last column records the performance of the model deployed on the edge nodes after compression. Taking group A1-B1 as an example, we can see from S (before DA) and T (before DA) that, as expected, the performance of the model trained on the source domain data degrades on the target domain data, which indicates that the domain-shift problem arises from changes in the environment. After retraining, the data of S (after DA) are generally better than S (before DA), which means that the performance of the model does not degrade, or indeed even slightly increases on the source domain data. Furthermore, the performance of the target domain data is also greatly improved. That is, our approach makes the recognition performance of edge nodes better. Finally, the recognition accuracy remains the same before and after compression from the last column. Most importantly, a model of this size can be easily deployed on embedded devices. Table 3 shows the time and space complexity. The time complexity of the model is measured by floating point operations (FLOPs), and the space complexity is measured by the number of model parameters. The compression ratio is approximately 0.4. The effectiveness of the algorithm for screening the datasets is then evaluated. To highlight that our algorithm can select data that are more useful for model updates, we designed a random screening method and Algorithm 3 for comparison, where n data were randomly screened on the target domain data. We screened three times and took the average value with the same amount of data to ensure fairness and randomness. The results are shown in Figure 10 for A1-D1 group of the TAU dataset. We also evaluated the effect of the data augmentation method. In Figure 10, the blue line represents the performance of our algorithm. By default, 85 pieces of data are filtered by adjusting η in Equation (9), and then the target domain is expanded to the corresponding n using the data multiplication method. In addition, to emphasize the ability of our algorithm to render the nodes more applicable to the environment within the constraints of LPWAN, three control groups were set up to evaluate the effectiveness of Algorithm 3. The first one is the accuracy of the source domain model in all target domain data before retraining, the second one is the accuracy of the new model in the only 85 pieces of uploaded target features after implementing Algorithm 3, and the third one is the accuracy after uploading all target domain features. Figures 10 and 11 show the performance of the data screening algorithm. In Figure 10, the accuracy steadily increases as n increases, implying that our algorithm performs better compared to random screening and can sort out higher quality data with the same amount of data. The effectiveness of the data augmentation method can still be seen from the figure, where the accuracy increases gradually as n increases after 85 s data are uploaded. In Figure 11, the effectiveness of our method can be clearly seen from the three control groups. On the one hand, uploading only 85 s of data collected from one node can significantly improve the recognition performance compared to not performing updates (before retraining in Figure 11). On the other hand, similar results are achieved in both evaluation criteria of accuracy and Kappa coefficient compared to uploading tens of thousands of data points. In particular, in some groups, such as A3-A2, the accuracy is improved by 5% and the Kappa coefficient by 0.17 compared to uploading all data. While there are still gaps with regard to uploading all the target features, these gaps are insignificant compared to the consumption of resources such as power. This is also in line with the usage scenario of low-power IoT. Retraining with few-shot data Figure 11. Data screening algorithm evaluation results. The bar chart indicates the accuracy before and after training, and the scatter plot indicates the Kappa before and after training. Each evaluation criterion is compared with three control groups, the first one is the accuracy of the model in the target domain without retraining, the second one is the result of uploading only 85 data points from the Algorithm 3, and the last one is the result of uploading all target domain features.

Evaluating Network Performance
Finally, we evaluate the network performance of the LPAI framework and its impact on the network in the standard LoRaWAN protocol through experiments and theoretical calculations, including the response time and the time required for model updates.
LPAI's network response time. Our nodes support two application modes of edge intelligence.
The first is the one-shot recognition mode. Namely, edge nodes perform inference and data screening only once. The process includes 1. Data acquisition: From collecting raw data to encoding using digital audio sensors. 2. Model inference: After extracting the raw data by MFE features, the classification result is then propagated by a neural network. 3. Data screening. The data screening algorithm determines whether these data need to be sent to the cloud. We use an Analog Discovery 2 oscilloscope to calculate the time required for each stage by flipping the I/O pin level, and the result is shown in Figure 12. The result is divided into three parts, where the first part is the process of obtaining raw data collected by the sensor. For continuous acquisition, the microcontroller acquires voice data in DMA dual mode, each buffer holds 0.25 s of audio data, and the double buffer pointers are exchanged each time one of the buffers fills up. The 0.25 s of data collected is appended to the MFE's operation window, and subsequent inference operations are performed when the window is filled with 1 s of data. The second part is to perform model inference after feature extraction from the original data. The third part judges whether to upload the compression feature through the data screening algorithm. The experimental results show that for one second of acoustic data, the total execution time is about 2 s. Δx1 = 206.9ms The second is the periodic recognition mode: the edge nodes are continuously recognized at a fixed sampling interval over a long period. That is, the edge nodes count events continually within a day. Take the scenario of identifying a vehicle at an intersection as an example. Each event is expressed as a 1-bit Boolean value. Assuming that in the most extreme case: uploading statistics for every minute within one day requires 1440 bits of data. Therefore, the total payload uploaded per day is about 180 bytes. Using class C of the LoRaWAN protocol for theoretical calculation, the time T tx required to upload 180 bytes in the CN470 frequency band is shown in Figure 13 and calculated by Equation (10) [47]. T tx is composed of two parts: the preamble and the payload, calculated as Equations (11) and (12), where SF denotes the spreading factor, BW is the bandwidth, N pre is the preamble length, N PHY is the payload length, CR denotes the coding rate which can take values from 1 to 4, and PL is the payload rate that indicates the physical payload length in bytes. It can be seen from the figure that as the spreading factor increases, the data rate decreases, and the transmission time of each data packet increases. When the spreading factor is 11, the response time is 7733 ms, which requires approximately 13 min with a 1% duty cycle.
T preamble = 2SF BW × (N pre + 4.25) (11) Time to back propagate compressed features and model updates. LPAI only needs to upload 85 compressed features to ensure 5% accuracy improvement, and each compressed feature occupies 4k bytes. Since LoRaWAN has a constraint on the payload length of a single packet, in the worst case, where the SF is 11, the maximum payload length is 223 bytes, so a compressed feature requires 18 packets to be sent out. According to Equation (10), the time to backhaul a single 4KB dataset is about 10 min at a maximum duty cycle of 1%. Furthermore, for the model update part, consider the worst case, which is 30k bytes at a time, where the time required at an SF of 7 is 83 min. In these calculations, we assumed continuous packet transmission. In a real-world scenario, LoRa nodes only upload a few pieces of data per day. Therefore, complete data are only sent down after weeks or even months, and that amount matches the LPWAN scenario.

Conclusions
In this paper, we propose LPAI, the first generic LPWAN-based AIoT framework for acoustic scene classification. We demonstrate experimentally in LoRa networks that the recognition accuracy of LoRa nodes can be improved by 5% by uploading only 85 compressed data points. In addition, we analytically prove the feasibility of LPAI, which bears important practical implications. LPAI advances LPWAN to a new frontier. In future research, we will further improve LPAI. On one hand, we will evaluate other standard networks for LPWAN, such as sigfox, with longer transmission distances and more stringent conditions. On the other hand, LPAI should also be applied to the imaging-based fields to promote LPAI as a more general AIoT system.