The proposed approach is intended to be used in modern cities as an effective mechanism to support the creation of more efficient wireless sensor networks, which is valuable when multiple concurrent WSN applications will be in execution in urban scenarios. As it works on a complex scenario, which exploits tweets for event detection, the validation of TwitterSensing is not straightforward. Nevertheless, we assessed the performance of this approach in different aspects, as presented in next subsections.
5.1. Event Detection and Classification
Initially, it was desired to attest the effectiveness of TwitterSensing concerning detection and classification of events of interest. For that, a considerable large tweets database was used as an input for a series of tests. After that, some reference optimizations were defined, demonstrating the practical usage of sensing prioritization.
The experiments were performed over one dataset obtained using the Twitter Realtime Filter
]. The obtained dataset (
) consists of 1.74 million geo-tagged tweets sent from New York, one of the most active cities on Twitter
, from 7 February 2017 to 19 May 2017. Each tweet contains the textual information (message), location (latitude, longitude) from which the message was sent, the time stamp when the message was sent, and identifiers of the author of the tweet. Besides the spatial window, no other constraint was applied when collecting the data.
The Event Detector was executed over the entire considering h, obtaining = 20,046 events distributed across the city of New York. To perform the event severity classification, we built two disjoint sets and : first, we extracted the first four weeks of E to build the training partition , resulting in ; the remaining set comprises 10 weeks, from 0 to 9 and starting on 6 March 2017, and composes the evaluation partition which will be considered on the studies of this section. While building our model, we first considered a baseline Multinomial Naive Bayes (NB) classifier with no optimizations nor preprocessing steps over the input data and, gradually, we expanded it by including preprocessing tasks to increase the model hit rate. By the end of this process, after including all previously mentioned NLP techniques, we achieved an average score of 71.02% correctly classified events over 64.69% of the raw NB performance. When compared to other baseline algorithm using Support Vector Machines, another common approach for text classification, the average baseline rate drops to 42.43%. In all mentioned steps, we employed an 10-fold cross validation procedure. After including our best classifier on Algorithm 1, it was calculated the priority of each . The output of this process is discussed in this subsection.
shows the number of detected events per week in a logarithm scale. Ranging from 4076 detected events on Week 0 to 436 on Week 9, this number is highly correlated with the amount and quality of the input data. Since the detection process is composed by a geographical and semantical analysis, a lower number of detected events is related to the lack of events itself, to a lack of spontaneous report from the users regarding some currently happening events or to highly odd messages about the same event jeopardizing the semantical analysis. In our dataset, we have a notable drop in the rate of detected events starting in Week 5 and stabilizing on the next weeks. The inflexion point, Week 5, correspond to the last week of April where we may have an wide combination of features (e.g., ending of vacations).
Besides the number of detected events, it was also assessed the percentage of tweets according to their classification. Figure 5
shows the percentage, in logarithm scale, of events by week grouped by their severity. Mobility-related events are dominant over all weeks, ranging from 83.63% on Week 0 from 90.44% on Week 8. On the other hand, unbounded events are the least frequent events, having weeks with no occurrences of them (Week 9). Despite some similarities, each week has a distinct event signature characterized by notably small changes in the event severity distribution, which directly reflects the city itinerary during the respective period and is essential for performing WSN optimizations. For example, events such as “shows” and “protests” of Women’s History Month [76
] triggered the high level of unbounded events on Week 0.
shows the average event severity distribution by weekday in logarithm scale, clarifying some general properties of Figure 5
. Foremost, independent of the weekday, Mobility and Weather related events are the most frequent in any weekday. On the other hand, unbounded events are dominant over Social events on Wednesdays and Thursdays, indicating a clear centralization of impacting events during this workdays which may highly impacts the economy, traffic and behavior of essential services depending of the event location and scope.
The results presented in Figure 5
and Figure 6
depict an interesting behavior of modern cities (at least in New York), indicating the types of events that will happen along the time. Such kind of verification can provide static information that can support better designing of sensor-based monitoring systems.
For the considered dataset, the average event priority by week computed by the Priority Computing Unit was also computed, as depicted in Figure 7
, according to the proposed formulations to calculate a priority value for every detected event. In the presented results, the maximum average value is correspondent to Week 8. Compared with Figure 5
and Figure 6
we perceived that, despite the slightly variation on the severity distribution which implicates in a small standard deviation (1.44), the weekly priority has a clear central tendency near to
due to the scope influence, which reinforces our initial city weekly routine hypothesis.
This relation of severity and scope in
is studied on Figure 8
. While mobility events covers almost the entire scope spectra, weather events are concentrated in very small scope values with more than 90% of them having a scope value inferior to 20. Social events are widely distributed, with 37.45% of its events having scope values between 30 and 70. Unbounded events have a similar distribution, with 48.19% of its events concentrated in the same scope interval.
These characteristics show the essential role of the scope, since the events which would be unfairly associated with very similar priorities even with very different risks are now weighted by this factor.
In a different perspective, the positions of the detected events are checked for the city of New York. Figure 9
and Figure 10
present heat maps of the detected events weighted by the respective priorities in different weeks, plotted using the Google Maps APIs [77
]. Figure 9
depicts the events of Weeks 0 and 1, having a large number of unbounded events across Manhattan and few scattered events in Brooklyn, showing that specific regions (e.g., Battery Park neighborhood) may have more monitoring demand than momentarily quiet regions according to the events happening near it. The city of New York has a well-know uneven population distribution, where Brooklyn and Manhattan are the most populated regions. It is interesting to notice that New York’s detected event distributions is slightly similar, where Manhattan and Brooklyn naturally generated more relevant events. In short, the Priority Computing Unit has very interesting results for densely populated regions, being directly linked to the population assiduity in Social Medias.
Actually, the tests showed two different uses of Twitter-based detection of events of interest. First, the detection and classification of events of interest is an effective mechanism for priority computing and assignment for sensors in WSN applications, as proposed by the TwitterSensing approach. Second, if large tweets datasets are considered, statistical information can be used to support the decision of where to deploy sensors. By doing so, regions with historical occurrence of more relevant events may indicate that more sensors should be deployed there, or even sensors with more resources (processing capability and energy supply) should be considered for deployment in that region.
After performing the tests, we could see that event detection exploiting geo-tagged tweets is feasible for Smart Cities, mainly when there are many Twitter users in the considered region. The detection and classification of events using the proposed approach may then be used to assign priorities to sensor nodes, opening new possibilities of optimizations for WSN applications.
Although the practical use of Twitter has been validated to provide information about events in a city, as well as their priority for sensing monitoring applications, the considered dataset was processed offline (batch processing). However, online processing of tweets can be easily performed, which is indeed required for quick and dynamic assignment of priorities to sensor nodes. As the performed tests were concerned with the practical detection and classification of events, large offline datasets were considered, but online processing of tweets is straightforward using the proposed Priority Computing Unit and the public Twitter API.
5.2. Exploiting Priority for Optimizations
In general, the events that were detected and classified by the Priority Computing Unit will be somehow exploited by one or more wireless sensor network, since the expected outcome is the enhancement of the overall monitoring performance. For that, priority indexes will be associated to sensor nodes (by the Priority Assignment Unit) and broadcasted into Priority Messages. After proper assignment, sensors may operate in different ways, depending on the application monitoring requirements, and thus how they will perform is out of the scope of this work. However, some priority-based optimizations may be defined as a reference, providing some hints about how sensors’ priorities may be exploited.
In general, global QoS parameters as event-based prioritization may be exploited in many ways and during distinct stages of the network lifetime. Different results may be achieved depending on the design of the proposed optimization approach and thus it is not easy to say that some particular optimization will always be better for the network performance. Nevertheless, for this particular verification, a reference approach was defined: SmartCitySensing. This optimization approach is a simple mechanism that exemplifies how sensing priorities may be exploited.
The SmartCitySensing approach was defined to allow that sensors adjust their sensing behavior in a Smart City scenario, according to the assigned priority. As long as applications define a monitoring profile, the sensing behavior of each sensor node may be a function of the computed priority level. Table 3
presents the mapping for the SmartCitySensing approach, associating priorities to transmission patterns.
The definitions in Table 3
are just references, since any configuration may be defined. In fact, the idea is to provide higher monitoring quality for sensors with higher priority, since the increasing in the visual data quality may be usually achieved when more information is transmitted (depending on the application monitoring requirements). Actually, the reduction in the transmission flow for lower relevant sensor nodes may be obvious according to the definitions in Table 3
, but we computed the amount of transmitted bytes for a single sensor node along the time, for transmission of only image packets, as expressed in Figure 11
. It was assumed that every pixel is represented by 16 bits and that image packets where transmitted for 60 s.
As can be seen in Figure 11
, higher priorities will result in the transmission of images with higher resolution and with higher frequency, which may be also thought as an increasing in visual monitoring quality for higher values of priority. In this situation, optimization is achieved when sensors are differentiated according to the detected events, since the expected quality of visual information should be a function of the relevance and area of influence of detected events of interest. In overall, transmission bandwidth usage is reduced and energy is saved since it is not necessary to apply the same transmission pattern (high visual monitoring quality) for all sensors, which could be required for networks where all sensors have the same relevance.
Actually, many different event-based optimizations may be designed and there are some examples in the literature [3
]. Whatever is the chosen optimization, the most critical issue is the proper computation and assignment of sensing priorities to source nodes, and this relevant part can be efficiently performed employing the proposed approach.
5.3. Employing TwitterSensing in a WSN
The previous subsection presented a simple but enlightening approach that demonstrates how sensing priorities can be exploited to optimize the network operation. Actually, as sensors will acquire different transmission behaviors due to the established priorities, the network resources are used in a more efficient way, optimizing the network as a whole. The conducted validation is only a first step in this direction, leading us to perform more complex tests to assess the effectiveness of TwitterSensing.
Considering the dataset retrieved from New York City and that was already processed for event detection and classification, the flow of the computed priority values was considered in a simulated wireless sensor network, further supporting validation of the proposed approach. A subregion of Manhattan with area of 6 km2
was considered, as presented in Figure 12
, where 1000 sensors were randomly deployed. For this simulation, the tool developed in [17
] was adapted to process a continuous flow of priority indexes, which is processed to establish priorities.
The simulated sensors are configured to have a communication range of 100 m and thus some sensors are offline after deployment, as can be seen in Figure 12
. The communication paths are established based on this communication range and a simple graph-based shortest-path algorithm (lines in Figure 12
), and all packets are transmitted toward the sink, which is located at the center of the considered region.
Based on the scenario presented in Figure 12
, and considering the input data provided by the developed TwitterSensing approach, different results could be achieved. As an initial validation, the impact of detected events on the number of sensors that will receive a new priority index was assessed, as presented in Figure 13
. For this verification, 14 days were considered from 7 February 2017, which is a subset of the processed data from New York tweets that was taken in Section 5.1
As can be seen in Figure 13
, on average, more than 50 events were detected and classified every day, and many sensors were affected by those events, which means that they received a new priority index by the proposed TwitterSensing approach. For the considered Twitter
dataset, there is a daily occurrence of events that could be exploited to adjust the priority of the sensors.
Besides the number of events, the average computed priority for the same period of time was also assessed, as presented in Figure 14
. The computed average priority is also presented with the margin of error, allowing us to better notice how priorities were computed for the considered time. Actually, the average computed priorities for two weeks were almost the same, but there are significant differences when considering the highest computed values (as on Day 6 in Figure 14
Obviously, the priorities of the computed events depend on the considered day and the configurations of the Priority Computing Unit, but as the priority range for the tests is set from 0 to 100, one can see the dynamical computation pattern for the considered city and period of time.
Actually, the performed verifications gave an important perception of the applicability of the proposed approach in a real scenario, reinforcing the expected benefits when employing the TwitterSensing solution for Smart City optimizations.