Towards Recognising Individual Behaviours from Pervasive Mobile Datasets in Urban Spaces

: Mobile phone network data, routinely collected by its providers, possess very valuable encoded information about human behaviors. Intensive tourist activities in urban spaces bring smartness via mobile phone ﬁngerprints into the understanding of an urban ecosystem. Due to the diverse processes that govern mobile communication, mining the geolocations of individuals seems to be non-trivial, tedious, and even irregular, which can lead to an incomplete trajectory. Enriching trajectories with infrastructural facilities is another challenge. We provide a uniﬁed approach, comprised of both informal and formal elements, to obtain a common framework, which maps pervasive datasets into a collection of individual patterns in urban spaces, to obtain context-enhanced trajectory reconstructions. Through the algorithmization of the approach, we acquire a study that provides new insights on individual and anonymized tourist behaviors. In order to obtain individual behaviors, it is necessary to carry out an arduous extraction process. We propose a multi-agent system architecture and predeﬁned message streams, which are transported on a message-broker platform. We also propose all of the basic algorithms that compose the prototype of the entire multi-agent system. All algorithms were formally analyzed due to termination and time complexity. System evaluation, together with a few basic experiments, was also carried out. The performance evaluation results authenticate system feasibility, credibility, and vitality. Those factors prove its effectiveness and the possibility to build the target system, whilst supporting every urban ecosystem. The system would also strongly inﬂuence municipal services to understand urban context and operate more effectively in order to support tourist activities to become safer and more comfortable.


Related Works
There are many works that have considered behavior recognition in ubiquitous computing and their relevant subsets, which focus primarily on pervasive datasets stored in BTSs as a subject of these research interests. Mobile phone datasets have gained both in importance and popularity (see the review paper byŞahin and Zhen [1]), both for empirical and theoretical research purposes concerning medicine [2,3], engineering, education, and even ethnic segregation [4]. The CDR datasets are the subject of intensive pre-handling and cleaning to obtain big dataset for further studies that are concerned with behaviors and activities [5]. CDRs are intensively analyzed to calculate different factors, coefficient, correlation matrices, etc. [6]. CDRs can be also the subject of intensive processing "wishes". The article by Al Ridhawi et al. [27] also discussed cloud computing concerning a probabilistic learning technique in the cloud. The problem of optimizing the resource in cloud computing to manage the cloud provider's resources was discussed in the paper by Al Ayyoub et al. [28]. Although cloud computing is important, in our work, we have only algorithmized our problem, and locating computing in the clouds can be the subject of further research activities, so we will return to it in the future. The article by Baker et al. [29] is another work referring to cloud computing; however, it also proposed a workflow model for an autonomic service composition. This work can be used in further studies, since workflows play an important role in the system design, and obtaining its logical specifications (see [30]), enables formal analysis of behavioral models in a logical style.
The problem of agentification the Internet of Things (IoT) was discussed in some articles; see for example the articles by Maamar et al. [31], and by Kwan et al. [32]. Our approach covers a partially-proposed methodology, containing the definition of an ecosystem, agentification of things, as well as implementing a case study.
Summarizing these works, the dominant observation is that there is a lack of a focus limited only to the individual behaviors within mobile phone datasets. However, the works influence this research by showing a challenging research direction, as well as considering some patterns of behaviors. The comprehensive and surveyed work by Blondel et al. [33] discussed mobile phone datasets analytics and patterns. It showed the many important aspects of social networks, geographical partitioning, and urban planning. The availability of mobile phone datasets builds a potential that could benefit urban ecosystems. However, we are interested in collecting more individual behaviors of a selected group of people, that is visitors, and not only foreign people. Their behavior patterns created during a city visit are specific and unique.
This work follows up on paper [34]. However, in the current work, the changes are rather far-reaching. Firstly, the multi-agent system architecture was fully modified: now it is much more realistic, while better reflecting the specified aims of the system, as well as its tasks are well-divided and dispersed. Even greater changes are related to the proposed algorithms, which were fully re-designed. This resulted from the fact that in the previous version, the algorithms had many simplifications, and as a consequence, they could not be successfully realized. In the current version, the algorithms are much more refined and take into consideration the new architecture of a multi-agent system. The current work differs from the previous work, due to the experiments that render the proposed system.

Preliminaries
Systems for mobile communications (e.g., GSM or UMTS) are well established. There are many works that have introduced the world of data communication procedures; see [35].
The most obvious part of the mobile phone network is a base station. A Base Transceiver Station (BTS) is a piece of equipment that enables wireless communication between a user and a network. Currently, cities and regions are covered with a relatively dense network of BTSs; see for example Figure 1. However, outside the cities, networks are less dense. In each case, they gather and store important and interesting information about different types of user activities.
A Call Detail Record (CDR) contains data recorded and produced by telecommunication equipment. CDRs, as collections of information, have a special format [36]. Below is given a sample fragment of a CDR text, decoded from binary format. The first row must contain a header row, which includes the field names: "Call Type","Call Cause","Customer Identifier","Telephone Num Dialled", "Call Date","Call Time","Duration","Bytes Transmitted","Bytes Received", "Descript","Chargecode","Time Band","Salesprice","Salesprice (pre-bundle)", "Extension","DDI","Grouping ID","Call Class","Carrier","Recording", "VAT","Country of Origin","Network","Tariff code","Remote Network", "APN","Diverted Number","Ring time","RecordID","Currency" The meaning of these columns is not analyzed here, since they are intuitive, and a very detailed discussion exceeds the scope of this paper. Location information is extracted as part of the interaction data. These location observations, i.e., the moment of the phone's entry into the area of a station (log in) and the moment they leave that area (log out), are of fundamental importance to the considerations given in the following sections of the paper.
CDRs serve a variety of functions. Mobile phone companies can also shape the form of records, for example introducing new fields, if necessary, to establish the whereabouts of an individual during their stay within the range of a station. Broadly speaking, the format of the CDR varies among providers; some programs also allow CDRs to be configured by the user. For the purpose of this work, the existence, or introduction, of certain fields allowing the identification of people as visitors (see Line 3 and the following in Algorithm 2 and see Line 17 and the following in Algorithm 1), as well as direct information about logging in or logging out of the BTS are assumed.
Information about the phone logging in and logging out, to and from a particular BTS, is extremely important for many reasons. Firstly, we know where the phone is located, since it can only be logged in to one station within the monitored area at a particular moment. Secondly, this base station can initiate certain actions, with the support of neighboring BTSs, which enable the geolocation of the mobile phone (see Line 34 in Algorithm 2), according to the methods described in [11]. It is commonly known that it gives a measurement precision level of around 150 m. The measurements are periodical and enable the building of a phone track, point by point, through the entire period of a phone's presence within the monitored area.
The ecosystem is a distributed, self-organized, and open system, which gathers knowledge about (selected aspects of) a smart city environment. It constitutes a community of digital devices and their environmental functioning, as a whole (hardware, software, services). This system might be extended to consider other aspects of a smart city, for example urban pollution, fire and emergency systems, water and sanitation, energy, etc., since tourist movement may strongly affect these aspects of city life. Thus, the diagram shown below could be much more elaborate, showing more details, taking many other aspects into account. However, it exceeds the main goals of this work and may be subject to other future research.
Thus, we will also show and build our system in a certain context created by an urban ecosystem; see Figure 2.
Some users (actors) of a smart ecosystem, that is context-aware, are identified: BTSs, other supporting services, emergency services, and public transportation management. A BTS compounds an infrastructure of mobile telephony, where CDR data are gathered, and partially, the algorithms of our system are also performed. Other supporting services consist of other external services that support our system; see Line 3 in Algorithm 5. An emergency service is an organization that ensures public safety and deals with emergencies if they occur (ambulance service, police, fire brigade, and others). Public transportation management consists of systematic processes, which collect and analyze information about conditions. These are required as inputs for the urban planning processes, in order to support decision-makers for the appropriate strategies. The above actors operate in the context-aware urban system, which consists of the following sample use cases: visitor monitoring (UC1), manage transport (UC2), and manage crisis (UC3). Brief descriptions of the use case features are provided instead of a formal scenario.  Visitor monitoring (UC1) is our system, the prototype of which is presented in this work. The main objective of this system is to understand the behavior of a large group of objects, namely visitors and tourists, staying within the city area, whose behaviors can influence the entire city.
Manage transport (UC2) means supplying chain management for transportation operations within the public area. When tourist activities in selected areas increase, the responses might be comprised of: increasing the frequency of buses/trams, shuttle services, if necessary, activating additional bicycle rental systems, etc.
Manage crisis (UC3) means a process dealing with events that threaten the general public. When tourist activities in selected areas increase, responses might be comprised of: launching/establishing a special emergency call number, increasing the number of open/active/overnight pharmacies in selected areas, increasing the number of hospital emergency rooms, improving security and enforcement of regulations, etc.

Tourist Destination Questionnaires
A questionnaire is a form that contains a set of questions usually directed at statistically-important tourist activities. A tourist questionnaire is a typical way to gather information, which can be used for managing a context-aware urban ecosystem. A questionnaire for tourist movement in destinations is discussed now, to clarify how smart systems based on recognizing tourist activities work. In the work of Sirakaya et al. [37], not only current research trends for leisure, recreation, and tourism were surveyed, but there were also numerous questionnaires included.
Lisbon, the capital city of Portugal, as well as its surroundings, are considered and used as an example. Tourists/visitors stay in Lisbon and, probably day by day, visit its monuments and various tourist attractions. However, some tourists during their stay in the city may wish to visit its surroundings, e.g., Fátima (religious reasons) or Cascais (recreational reasons), as well as Sintra, which is known for historical and architectural monuments and is classified as a UNESCO World Heritage Site; see Figure 3. All of these places/sub-destinations, except Fátima, are located in the Grande Lisboa subregion (see http://en.wikipedia.org/wiki/Grande_Lisboa).  [34]. (the base map is from Google Maps.) A sample and common questions for visitors are shown in Table 1. There are also many other tourist questionnaires available, for example [38,39]. These questionnaires are distributed to visitors during their stay at a destination. They refer to many details of the visitors' trip and stay. Forms are usually designed by tourism organizations for people who are going to spend at least one night at the destination. Questionnaires are conducted anonymously.
One of the main objectives of the questionnaires is to know more about visitor characteristics for marketing purposes, as well as to identify the size of the tourism activity. Other characteristics cover types of visitors (foreign or home, business or leisure, overnight or day trip). They also allow us to identify where visitors, if any, go outside the examined basic destination, and what is the scale of sub-destination visits.
The purpose of this paper is also to provide methods of gathering information about tourist movements automatically, that is to replace manual surveys with a fully-automated process, and then use this information for a smart urban ecosystem. It should be noted that the typical granulation for a BTS is about 500 m in a city (urban areas) and about 1000 m outside a city. On the other hand, there are some advanced algorithms and models [11] to enable the estimation of a phone position between stations with an accuracy of about 150 m in urban areas. Let us also note that the Home Location Register (HLR) is maintained in mobile networks in order to provide information about subscribers who are registered in a core/local network. The Visitor Location Register (VLR) is the opposite, in that it provides information about network visitors (outside/country or foreign). These two records are important for the approach, since they allow us to find out who is a visitor and who is not. Although there are some exceptions, the probability of correct verification based on VLRs/HLRs is very high. We will use these methods, and will refer to them, in our system; see Line 17 in Algorithm 1. In case of any difficulties or doubts, the billing databases of mobile providers might additionally be examined. Table 1. Sketch of a sample tourist questionnaire; see also [34].  Table 1 leads to the following taxonomy based on the information expected to be obtained from the BTS datasets, which constitutes an informally-expressed algorithm:

The analysis of points/questions in
1. answers that are obviously easy to obtain, e.g., Point 1 or 3; 2. some answers are available through digging deeper, but direct analysis of the BTS data is still required, e.g., Point 2 and the VLR/HLR records; 3. a certain number of answers need a pattern analysis for individuals, e.g., the comparison of the locations during the day and night for Point 4, or less/limited mobility (business) and greater mobility (an active city exploration typical for tourists) for Point 6; 4. some answers require a pattern analysis for a group, if any, of visitors, in other words, a group of objects are examined, to see if they are moving together, e.g., the city exploration with a group of mobile phones/visitors for Point 8, or with a local (cf. VLR/HLR records) mobile phone of a local guide for Point 9; 5. some points need additional (open) technologies to answer questions, e.g., OpenStreetMap (OSM) (see: http://en.wikipedia.org/wiki/OpenStreetMap) to locate/identify selected objects like airports or railway stations for Point 7, hotels/hostels for Point 5, museums/churches for Point 6, or suburban areas (close or distant) for Point 12; 6. there are some answers that require historical data analysis, e.g., a previous presence in a destination for Point 10; 7. some answers require access to commercial/bank data, e.g., credit/debit cards used in the destination for Point 11; 8. several answers could be obtained while analyzing, for example, social networks, reservation systems, or web vendors, e.g., sources of information for Point 16; 9. some answers could be obtained when web forms are sent directly to mobile phones, after the visit in the destination is over, e.g., sources of information for Points 13-15, 20; 10. some points for which obtaining answers based on BTS datasets are impossible, or problematic, e.g., Points 17-19; 11. last but not least, there is some information that could be extracted from the BTS data and that is not usually the subject of any questionnaire (thus, no points in Table 1 are indicated here), but it could be used to analyze other parameters of tourist activities, e.g., intensity of call/sms/mms/web transmissions during the entire visit, or in particular places, and through numerous valuable conclusions that follow.
The above classification is crucial and gives an idea of the foundations for the solutions and methods proposed in the paper, that is how use information gathered in CDRs is treated as a base for the pro-active decisions of an urban ecosystem. In other words, the above classification constitutes a base for methods of building knowledge about tourist activities; see Line 8 in Algorithm 7. However, this would be a topic for a separate research project. The purpose of this article is to provide a vision of the prototype system that collects data, which will then be subjected to such in-depth analysis.

Multi-Agent System
A multi-agent system and its architecture is proposed in this section. The system is used to solve the problem of surveying the tourist movement in a destination, in the way described in the previous section.
The following taxonomy of the agent is proposed: A -Angel-the-guard agent, an agent created for a new phone that appears within the entire monitored area, but only when the object is classified as a visitor. From that moment, the agent exists in a system until the object leaves the area. It stores the entire trace that refers to the particular visitor. From a data flow point of view, this type of data is collected from agent B, and goes to agent A, through agent X. Agent B calls out information about different visitors, and agent X redirects these data to the proper agent A. When the phone leaves the monitored destination, all information gathered is passed to agent Q, through agent P, and agent A is removed from the system; B -BTS agent, this agent is present in every BTS and gathers data related to all visitors monitored by the system (not all pieces of data gathered in a BTS belong to visitors). Data are sent away to the particular agent A, through agent X. After a successful sending process, the data are deleted from the agent B repository; X -eXchange agent, this agent is only responsible for redirecting and further transferring the messages received from different B agents to proper A agents. Redirecting is performed after partly decoding the message, including information about the phone number; F -Facility agent is an agent that exists in the system permanently, the purpose of which is to identify and process newly-detected tourist facilities. This kind of event means the geolocation of an infrastructural object within the monitored area. The basic facilities result from the needs of a particular questionnaire and among them may be: airport, train station, hotel, office building, restaurant, cinema, theater, religious building, graveyard, etc. It is assumed that for identification purposes that different services are used, for example OpenStreetMap, which enables the identification of particular facilities (if there is such a need, tourist facilities can even be manually edited by an administrator for agent F, which may force introducing/deleting particular tourist facilities). Summing up, agent F keeps a list of all analyzed facilities within a monitored area; P -event Processing agent, the agent analyzing all particular routes collected by agents A and, on the basis of tourist facilities identified by agent F, applying them to the routes gathered by particular A agents. In such a way, obtained routes are enriched with information concerning infrastructural objects, if such objects exist, along the route of a particular visitor. It is carried out by comparing the geolocation of an infrastructural object and the visitor's route; Q -Questionnaire agent is an agent existing in a system, the purpose of which is to update a questionnaire, or questionnaires, built in this destination. The questionnaire is updated when an object leaves the entire destination and its agent A is to be removed; M -Managing agent is an agent existing in a system permanently, the purpose of which is to initiate global system variables, as well as to manage other agents.  The number of A agents in a system is equal to the number of visitors in a given destination. The number of B agents is equal to the number of BTSs within a monitored area. There is only one F agent in a destination, as well as only one P agent. Although only one X agent was presented, it is always possible to increase their number to improve the capacity of the system. A similar remark can concern agent P, as well.
A list of more important variables in a system includes: • BTSList: list of all BTSs that cover the monitored area and enables the gathering of all data necessary to study the behaviors of visitors; • f acilityList: list of infrastructural objects within a monitored area. Those objects can be identified by different types of services, and the presence of phones in their surrounding area is registered as a meaningful event. After completion of all data, namely the whole route of visitors with facilities, an analysis of traces of tourists' presence, in terms of questionnaire questions, is performed; • visitors: list of mobile phones within a monitored area that were selected for tracing, as they are identified as unknown, i.e., coming from, and registered in, another (far) area. It is worth mentioning that every phone from visitors has its agent A; • BTSPhoneList: a variable defined locally in each BTS, which includes a list of phones classified to be monitored as visitors and remaining within range of a certain station. It is important to mention that the sum of all variables belonging to BTSPhoneList, thus from all BTSs, is equal to variable visitors.
Let us pay attention to sets visitors and BTSPhoneList, the elements of which are attributed to time, or with a time-assigned value. In practice, this means that for each phone in these sets, we can get the (last) value of the time in which the phone was observed. Thus, we have to refer to the well-known operations on sets, which are union ∪ and difference \. We redefine union ∪ in this way, that if we have several of the same elements, but differently time-attributed, the element with the largest time attribute is selected for the final set. Difference \ remains unchanged, that is certain elements of the set are deleted, regardless of their time attributes. We introduce another difference operation \ t , which works just like the classic one, except for situations when an element to be removed has a higher time attribute than the one indicated for deletion; in this case, the element with the higher time attribute is not removed. Let us consider some examples to clarify these informal definitions. Say t 1 , t 2 , t 3 , t 4 , t 5 , . . . means sequential time instances that are used to attribute elements of a set. Then, we have: The structure of the most important messages in the system is shown in Table 2. (Symbols for the table: "≡" is defined as; "=" is equal to; "+" conjunction; "( )" is optional; "[ ]" the choice of one possibility; "|" separator of disjoint choice related to [ ]; "{ }" iterator of possibilities with a given minimal number of occurrences. The meaning of particular elements is intuitive, whereby "timestamp" means a time stamp, a certain moment in time, when measurements were performed, "geolocation" is the latitude and longitude of the object, "facility" is an observed object of infrastructure.)

Methods and Algorithms
Several algorithms for handling the entire system are proposed in this section. They refer to the classification of agents defined in the previous section. (Incidentally, we resigned from defining the agents' ports to which messages are sent, so as not to over-formalize. This does not result in ambiguity. We also use two well-known and equivalent notations for the substitution operation, that is v := v * z and v := * z, for two arguments and a sample operand.) procedure LOGINOUT inner and local procedure 3: if m.logType = login then in a BTS 4: critical visitors2 := \{m.phone} end; 5: visitors := ∪{m.phone} 6: else not in a BTS 7: visitors := \{m.phone}; if m = nil then delay(d1); continue d1 established by admin 14: end if and continue the loop from the beginning 15: if timeGtEq(m.phone, visitors ∪ visitors2) then newer info 16: if m.visType = unknown then 17: m all agents B 52: start process checking1, checking2;

Input
inner processes The entire system is initiated by agent M, whose operations are illustrated as Algorithm 1. The agent starts and initiates important system variables; see Line 48 and the following lines. There is a pre-defined set of BTSs BTSlist that covers the monitored destination. Variable visitors stores all phones that are located within a monitored area and are recognized as tourists, or more generally, visitors. Variable visitors2 consists of phones suspected of possibly leaving their destination, since it was observed that they logged out from a BTS. However, this logging out can be momentary, since a tourist will be immediately, or after some time, taken by another BTS, and in such cases, a phone will be transferred back to variable visitors. In the following lines of the algorithm, all agents are initiated. In the last line, two concurrent internal processes are called out. We have only one critical section in the algorithm since only one variable is protected, and we do not need to differentiate sections, otherwise we should use the following sample syntax: critical <sec_name>: <instructions> end. Thus, "<sec_name>:" is omitted here.
Process checking1 is responsible for control of log in and log out operations of all monitored phones, in all BTSs, that is objects recognized as visitors, to and from particular BTSs. Function timeGtEq in Line 15 checks if the phone (first argument) has a higher or equal time attribute than the element in the set (second argument). If the phone does not belong to the set, then timeGtEq returns true. In this way, we provide the processing of newer information about a phone, if at the same time some information is received. Procedure re f reshTime in Line 25 updates the value of the time attribute, if the phone belongs to the set and if it has a lower attribute (we could also write the equivalent code: if (m.phone ∈ visitors) then visitors:= ∪{m.phone} end if). The important instruction is in Line 17 because the called procedure enables determining, using VLR/HLR services, if an object is registered outside the monitored area. It will then be recognized as a visitor. In procedure veri f yVisitor, other actions can also be taken to enable object classification, for example describing if it is located, informally speaking, a short or a long distance from the monitored area, thus if it is an apparent visitor or, for instance, a close neighbor commuting to work, or a real visitor. Although it is an important issue, it will be omitted now, since it is of a rather technical nature and there is the possibility to carry out a thorough analysis in the future.
The work of agent B, located in each BTS, was presented as Algorithm 2. The algorithm starts from Line 39. Then, three other processes are initiated. Process BTSlogin is responsible for controlling log in and log out operations of objects. In Line 3, an important procedure is called out. It verifies newly-appearing CDR records, which have the values visitor or unknown in the field visType, or the values login or logout in the field logType. Later on, m1 is sent to agent M in order to verify and compare it with variable visitors. Process BTSvisiting enables denoting that phone was recognized as a visitor. Then, this information is introduced to CDRs; see Line 23. This is a result of the requirement to not re-examine records, in the BTS database, which were identified as belonging to a visitor. In other words, once identified, they do not change their status. Process phoneLocation determines the current geolocation of every tourist's phone that is within range of a BTS; see Line 34. The process of geolocation, where neighboring BTSs are included, is a separate procedure, which works according to the well-known rules [11].
The task of agent X (see Algorithm 3) is to redirect all messages from the B agents and then send them to the proper A agents.

end if 9: end loop
The process of agent A is visualized as Algorithm 4.

end if 11: end loop
The first instruction is located in Line 4. The main task of the agent is to collect the registered geolocations, which will constitute the entire route of a particular phone. Data arrive from different B agents, via agent X. It is necessary to point out that agent A has its own destructor, whose task is to send all gathered data about the route to agent P, where it becomes subjected to further processing. After using the constructor's method, the agent itself turns into a singular state, and all resources allocated to it are finally removed by calling a process, in this case agent M; see Line 42 in Algorithm 1.
Agent F (see Algorithm 5) is used to build a list of infrastructural objects that exist within the monitored area. ...... other services in the same way 6: if required f acilityList then if a requirement from agent P 7: send(P, f acilitytList) to agent P 8:

end if 9: end loop
Those objects will later be taken into consideration when building questionnaires. The following two basic cases will be analyzed: a phone is located nearby such an object or this object was recorded directly in the tourist's route. As previously mentioned, the algorithm, that is agent F, is dealing with only one issue, which is updating and keeping the list of objects within the monitored area valid. It is true that infrastructural objects do not change very often; however, it seems that it is beneficial to have a separate and specialized agent responsible for this type of task.
Enrichment of the tourist's route of infrastructural objects located along the phone trail is performed by agent P; see Algorithm 6.

Algorithm 6
The P agent operations (P-operations).

end if 9: end loop
In Line 5, the list of infrastructural objects is downloaded. This list is built and updated by agent F. It is supposed that there is a possibility that some changes in the facility list were made after the previous inquiry, so it is necessary to download it once again. In Line 6, there is a direct verification of the route and enrichment of its description by infrastructural objects located along the route or nearby. This problem itself is a separate topic, since it is a procedure that does not follow general rules. For instance, there is the question of how much time does the visitor have to stay nearby in order to record his/her presence, or, equivalently, to omit accidental situations. However, this topic exceeds the main aim of this work.
Agent Q, after gathering all necessary data about the tourist's route, updates all questionnaires on the basis of tourist's data; see Algorithm 7.

Algorithm 7
The Q agent operations (Q-operations).

end if 11: end loop
It is assumed that, generally, many different questionnaires can be prepared; however, at least one is necessary. Line 8 shows the updating process for a questionnaire. This type of updating can be the topic of a separate project and research study, where numerous detailed questions connected with the procedure of filling in the questionnaire have to be answered. Some of those questions can be simple, but others are much more complex; see Section 4.
As a form of summary, see Figure 5, which shows basic data and message flows in the system, that is in all algorithms presented above. Let us note that the entire system is initiated with the following basic set of agents: M, Bs, X, F, P, and Q. When no visitor is recognized, then no agent A is created.
The agents presented in this section, as well as their algorithms deal with all important aspects of the subject. When presenting the agents, some attention was paid to a few minor problems. However, in order to avoid getting too deep into unnecessary details, those problems can be solved in the future. Some other minor problems also exist, which were deliberately omitted in order to avoid overcomplicating the content. An example of such a case may be a situation when an object classified as a visitor stayed too long within a monitored area, becoming, in fact, its resident. There may be different reasons for such a situation to occur, and it can be solved by introducing a maximal time limit a person can stay within the monitored area.

Evaluation and Experiments
We provide both theoretically-and practically-oriented considerations to show the entire characteristic of the proposed approach.

Evaluation
Finding time complexity, which signifies the total time required by algorithms to run, is important for any algorithm. Please find the following statements below.
Algorithm 1 always terminates. Proof: The algorithm does not contain recursions. All instructions are precisely defined. The infinite loop symbolizes the readiness for constant processing and can be changed into a loop with an exit condition or the break instruction when required. The "for" loop is performed strictly and a prescribed number of times. There is only one critical section that protects the variable visitors2. The lock mechanism occurs in different and separate lines in process checking1 and only once in process checking2. If requests appear from both processes at the same time, they will be queued. Thus, the algorithm is deadlock free. If there are no data, the reception operation does not stop processing. Then, the random delay time (d1) is set for the next reception; see Line 13. All data sending operations (see Lines 23, 31, 33, as well as lines 21 and 42) are asynchronous.
Algorithm 1 has Θ(m) complexity, where m is a total number of monitored phones. Proof: The algorithm has a dominant loop running through m phones, observed as visitors in the entire destination. The "for" loop is performed a limited number of times. The other instructions have fixed costs. Finally, the average value is Θ(m).
Algorithm 2 always terminates. Proof: The algorithm does not contain recursions. All instructions are precisely defined. The infinite loop symbolizes the readiness for constant processing. The "for" loop is performed a prescribed number of times. There is only one critical section for the variable BTSPhoneList. The lock mechanism occurs in processes BTSlogin and BTSvisiting. If requests appear from both processes at the same time, they will be queued. Thus, the algorithm is deadlock free. A lack of data does not stop processing; see Lines 4 and 20. The sending operations are asynchronous; see Lines 6 and 34. Algorithm 2 has Θ(b) complexity, where b is a number of monitored phones in a BTS. Proof: The algorithm has a dominant loop running through b phones observed as visitors in a BTS. The same holds for the loop "for". The other instructions have fixed costs. Finally, the average value is Θ(b).
Algorithm 3 always terminates. Proof: the algorithm does not contain recursions. All instructions are precisely defined. The infinite loop symbolizes the readiness for constant processing. A lack of data does not stop the reception operation. The sending operation is asynchronous. Algorithm 3 has Θ(1) complexity. Proof: The procedure only transfers data. All instructions have fixed costs.
Algorithm 4 always terminates. Proof: The algorithm does not contain recursions. All instructions are precisely defined. The infinite loop symbolizes the readiness for constant processing. A lack of data does not stop the reception operation. The sending operation is asynchronous. Algorithm 4 has Θ(t) complexity, where t is an average number of elements of which the typical track consists. Proof: The algorithm has a dominant loop running through t elements of an entire trace of a phone. Finally, the average value is Θ(t).
Algorithm 5 always terminates. Proof: the algorithm does not contain recursions. As it previously mentioned, the instructions in Lines 3 and 4 only symbolize concurrent gathering and updating f acilityList. In fact, data gathering should be organized as a background processing. The infinite loop symbolizes the readiness for constant process. The sending operation is asynchronous. Algorithm 5 has Θ(o) complexity, where o is a number of facilities registered in a monitored area. Proof: The algorithm processes all facilities o registered in the monitored area.
Algorithm 6 always terminates. Proof: The algorithm does not contain recursions. All instructions are precisely defined. The infinite loop symbolizes the readiness for constant processing. A lack of data does not stop the reception operation. The operation in Line 5 is a procedure call, but we assume that f acilityList is always available. The sending operation is asynchronous. Algorithm 6 has Θ(t) complexity, where t is an average number of elements of which the track consists. Proof: The entire trace, which is browsed, of a phone consists of t elements.
Algorithm 7 always terminates. Proof: the algorithm does not contain recursions. All instructions are precisely defined. The "for" loop is performed a prescribed number of times. The infinite loop symbolizes the readiness for constant processing. A lack of data does not stop the reception operation. We assume that the update procedure (see Line 8) always terminates; on the other hand, it could be organized as a background process. Algorithm 7 has Θ(m · t) complexity, where m is a number of phones, and t is the average number of elements in a track, which are required for a questionnaire analysis. Proof: The algorithm has a dominant loop running through m phones, observed as visitors in the entire destination. We process each questionnaire with an average cost t, that is it depends on the number of elements in a track. Finally, the average value is Θ(m · t).
We are also interested in the whole system, understood as a group of cooperating agents and their algorithms. The entire system has Θ(m · t) complexity, where m is a number of phones and t is the average number of elements of a typical trace. Proof: We use the parenthesis structure to show the calling of algorithms/processes in the entire system: A1(. . . , A5, A6, A7, A3, A2 · · · A2, . . . , A4 · · · A4, . . .) where letter "A" is related to the particular algorithm and the bottom dots specify other instructions outside the calling point, while the middle dots specify multiple instances. Using the previous statements and proofs, we get the following results. A5, in practice, has fixed costs Θ(1), due to the fixed number of facilities. A6 provides Θ(t). A7 provides Θ(m · t). A3 has fixed costs Θ(1). All A2 provide Θ(b), due to concurrent performance. All A4 provide Θ(t), due to concurrent performance. Thus, we obtain A1(..., Θ(1), Θ(t), Θ(m · t), Θ(1), Θ(b), ..., Θ(t), ...). A1 provides Θ(m), and Θ(m · t) dominates in the parenthesis structure. Finally, the complexity of the proposed system is Θ(m · t).

Experiments
Programming experiments were also carried out, with the aim of verifying the proposed system. They are of a limited range, mainly related to the figure of 10,000 (ten thousand) visitors, or tourists, but also to the fact that the works were carried out with random data, which were also connected to the calculation of geolocation. About 100 different events (arriving at facilities) have been generated for each visitor. The experiments enable the verification of the system's possibilities and present its working vision and the future target version. Another important role is the illustration of the system's functions. Figure 4 presents the architecture of the multi-agent system; see also Table 2. In our experiments, messages were transported using the Kafka [40,41] platform. It is an efficient message broker platform, which enables different types of data streams to be sent. The experiments were performed within a chosen area of the system, that is the busiest one, in relation to BTSs. The experiments were limited to the following three types of messages: 1. messages initiating the creation of agent A; 2. messages connected with a mobile phone logging in and out to and from a BTS; 3. messages connected with a phone location; great attention needs to be paid to a large number of such messages, which results from the fact that every BTS regularly calculates geolocation, separately for each phone logged into a given station, and those messages are later sent to agents A. Figure 6 presents the topics of the most important elements, that is in the busiest part of the system. Agent B located in each station can send all types of messages. Kafka, once again, proved itself to be an efficient means of transmission for huge message streams, as was already mentioned in the work [42], Section VI.B, also when the particular system elements were dispersed in a computing cloud ( [42], Figure 6), which mirrors the real conditions very well.
Data processing possibilities will also be presented as an example of analysis when it comes to the particular questions in a questionnaire. Generally speaking, data analysis of that kind is a challenging task and should be solved separately. For that reason, only a few simpler, but quite interesting cases will be presented. The way of coding the particular problems is illustrated, as well. First of all, attention needs to be paid to Listing 1 including a fragment of trace, or route, registered in relation to the particular tourist. The limited size of this work enables presenting a listing that only shows parts of such a trace, in reality covering numerous positions together with registered events, but, above all, the routing progress within the city.  18.29 ,9.5 ,9.0; phone =3249 , 23.12.2018 ,18.30 ,9.5 ,9.5 , club , Karlik ; phone =3249 , 23.12.2018 ,18.31 ,9.5 ,9.5 , club , Karlik ; ..... phone =3249 ,24.12.2018 ,08.01 ,6.0 ,5.0 , airport , Balice Airport ; Listing 2 includes a fragment of code, which presents an analysis of the means of tourist arrival. We assume by default that he/she arrives by car. However, we later investigate a certain number of initial events appearing in the registered visitor's trace. If another means of transport appears, for instance airport, bus station, or train station, this place is assumed to be the correct one. It is, of course, a disputable issue if we should examine a certain limited number of initial events, or rather concentrate on events within a certain time limit, from a first phone logging in within a monitored area. We have chosen one solution, but other approaches are also possible. Listing 3 includes fragments of code, which present the results of investigation if a visitor visited museums while staying in the city. If a tourist visited two or more museums, this kind of behavior is interpreted as one of the goals of the trip. There are, of course, different ways of analyzing this issue, for example a tourist could arrive to the city and visit only one museum, maybe even to see one particular painting, and such visiting cases also have to be taken into consideration. In such cases, it seems to be a reasonable solution to check the length of their stay in a museum, which can be determined on the basis of the tourist's trace. It is an open topic, and our aim here is to present the means of data processing, as well as proving its feasibility and reachability. It seems that our goal was fully achieved. Figure 7 presents the results of analysis, of the whole verified population of visitors, when it comes to the way of reaching the city. A question about this aspect of staying in the city is one of the first questions from a questionnaire presented in Section 4, and most probably, it would be present in every other analyzed questionnaire. Figure 8 illustrates the ways that visitors spend their time during their stay in the city. These are two important criteria that are not mutually exclusive. Practically, this means that people who like visiting different museums can also take part in numerous club parties organized in the city. In the described experiment, a similar analysis was carried out on another, potentially interesting question. The examination is, in fact, checking gathered data related to each visitor. These data have a form of data stream gathered by agent A and later enriched by agent P; see data stream P2Q in Table 2. Content reasoning can be carried out on the basis of this kind of material. The issue of the precise definition of the investigated events always remains. An example can be a dilemma: should visiting museums be connected with a number of visited places or should another criteria be taken into consideration. It needs to be mentioned that these are problems of a different nature, and our goal was to design an IT system that can gather a huge amount of data for further proper analysis.  Figure 8. The way of spending time in the city or visiting museums and parties.

Others
Some of the questions related to questionnaires are easy to answer, the others require deep analysis; however, this was already discussed in this work; see informal considerations in Section 4.

Conclusions
The paper presents a novel method for mining the individual behaviors of visitors in a destination from pervasive BTS-based datasets. The questionnaire behavior (see Table 1) gives an informal idea of how our system works. The system is authenticated through the introduced architecture of a multi-agent system, through the proposed algorithms, as well as conducted experiments.
The presented system opens up real possibilities of the implementation of proper software, which would be essential in the process of tourist traffic evaluation within the monitored area, understanding its specifics, advantages and disadvantages, supporting municipal services, etc. Some difficulties related to the system were discussed in this work; however, these are rather minor problems of a technical nature. After making proper decisions, they can all be easily solved.
The proposed system enables the gathering of huge amounts of data; however, as previously mentioned, the main issue is to fill in questionnaires on the basis of possessed data collections. This problem can be solved as well, as presented in the work, but it should be treated as a separate project.
Another issue is access to data gathered in BTSs. There may be legal issues related to privacy and the sensitivity of personal data. However, these data can be anonymized, since our main interest is collective behaviors, for example the behaviors of tourists, considered as individuals, visiting a given area, not individual ones carrying information about individual preferences and privacy. Data gathered in BTS networks is the most widespread type of data, due to the prevalence of mobile phones, and cannot be replaced by any other widely-available phone applications. Last, but not least, let us pay attention to the sentence of the U.S. Supreme Court concerning the records of numbers called not being protected by the Constitution, but practically, there is limited protection, delegated to law-making power to acts of lower bodies (see https://en.wikipedia.org/wiki/Smith_v._Maryland). The problems presented above illustrate the future work directions very well, which should include more detailed algorithms and other mentioned procedures. It is also worth noting that our approach could also be extended to hybrid data sources, that is combining CDR-data and sensor data; see [42]. Acknowledgments: I thank my students Tomasz Borowicz and KrzysztofŚwider (AGHUniversity of Science and Technology) for their help with the experimental part of this research.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: