Analyzing the Behavior and Financial Status of Soccer Fans from a Mobile Phone Network Perspective: Euro 2016, a Case Study

In this study, Call Detail Records (CDRs), covering Budapest, for the month of June in 2016 has been analyzed. During this observation period, the 2016 UEFA European Football Championship took place, which affected significantly the habit of the residents, despite the fact that not a single match was played in the city. We evaluated the fans' behavior in Budapest, during and after the Hungarian matches, and found that the mobile phone network activity reflects the football fans' behavior, demonstrating the potential of mobile phone network data within a social sensing system. The Call Detail Records are enriched with mobile phone properties to analyze the subscribers' devices. Applying the device information (Type Allocation Code) from the activity records, the Subscriber Identity Modules, that do not operate in cell phones are omitted from mobility analyses, allowing to focus on people. The mobile phone price is proposed and evaluated as a socioeconomic indicator, and correlation between the phone price and the mobility customs have been found. We also found that, beside the cell phone price, the subscriber age and the subscription type also have an effect on the mobility. On the other hand, these do not seem to affect the interest in football.


Introduction
Football is one of the most popular sports worldwide and European or World Championships, especially the finals, are among the most watched sporting events. The Euro 2016 Final was watched by more than 20 million people in France [19], or the Germany vs. France semifinal was watched by almost 30 million people in Germany [19]. But, what about Hungary?
In [36] and [38], mass protests are analyzed via mobile phone network data. In [41,24,46] and [16], the authors examined the location of stadiums, where the football matches took place. Traag et al. [41] and Hiir et al. [16] also found that the mobile phone activity of the attendees decreased significantly. In [41], z-score is also used to express the activity deviation during the social event from the average. Xavier et al. compared the reported number of attendees of these events with the detected ones. Furletti et al. also analyzed sociopolitical events, football matches and concerts, in Rome [11]. This paper focuses on football matches, that however, took place in a remote country (France), and the fans' activity are studied in Budapest.
Mobility indicators, such as Radius of Gyration or Entropy, are often calculated [33,47] to describe and classify the subscribers' mobility customs. Furthermore, using mobility to infer about Social Economic Status (SES) is a current direction of mobility analysis [47,5,1,34]. Cottineau et al. [5] explored the relationship between mobile phone data and traditional socioeconomic information from the national census in French cities. Barbosa et al. found significant differences in the average travel distance between the low and high income groups in Brazil [1]. Xu et al. [47] found opposite travel tendencies in mobility of Singapore and Boston. In our previous work [34], we showed that the real estate price of the home and work locations characterize the mobility and validated our results with census data. In this paper, the price and the age of the subscribers' mobile phones are proposed as a source of the socioeconomic indicator. While Blumenstock et al. used the call history as a factor of socioeconomic status [2], Sultan et al. [40] applied mobile phone prices as socioeconomic indicator and identified areas where more expensive phones appear more often, however, only manually collected market prices were used.
Mobile phone network data is also used to analyze the human mobility during COVID-19 pandemic and the effectiveness of the restrictions. Willberg et al. identified a significant decrease of the population presence in the largest cities of Finland after the lockdown compared to a usual week [45]. Bushman et al. analyzed the compliance to social distancing in the US using mobile phone data [3]. Gao et al. found negative correlation in stay-at-home distancing and COVID-19 increase rate [12]. Still, these analyses might not be common enough. Oliver et al. asked: 'Why is the use of mobile phone data not widespread, or a standard, in tackling epidemics?' [30]. This, however, is not within the scope of this paper.
In this study, we analyzed the mobile phone network activity before, during and after the matches of the Hungarian national football team. The Call Detail Records (CDR), analyzed in this study, have been recorded Budapest, however the matches took place in France. We present another example of social sensing, using CDRs, in an indirect and a direct way. Indirectly, as the mobile phone activity of the sport fans, residing in Budapest, are studied during matches played in France. Directly, as the spontaneous festival on the streets of Budapest after the third match, and the welcome event at the Heroes' Square are presented from a data perspective.
The Call Detail Records are filtered by the Type Allocation Codes (TAC) to remove those Subscriber Identity Module (SIM) cards, that do not operate in mobile phones, thus not used by actual people. The price and age of the cell phones are also analyzed in contrast of the subscribers' age and mobility customs.
The contributions of this paper are summarized briefly as follows: 1. Fusing CDR data set with mobile phone prices and release dates.
2. Filtering out SIM cards, that do not operate in mobile phones.
3. Demonstrating connection between the phone price and the mobility customs. 4. Proposing mobile phone price as a SES indicator.
5. Attendees of the large social events are compared to the rest of the subscribers based on their mobility and SES.
The rest of this paper is organized as follows. The utilized data is described in Section 2, then, in Section 3, the applied methodology is summarized, and in Section 4, the results of this study are introduced. Finally, in Section 5, the findings of the paper are summarized and concluded.

Materials
Vodafone Hungary, one of the three mobile phone operators providing services in Hungary, provided anonymized CDR data for this study. One record Less than 10 records Less than 100 records Less than 1000 records More than 1000 records  , shows the activity distribution between the activity categories of the SIM cards. The dominance of the last category, SIM cards with more than 1000 activity records, is even more significant. This almost 27% of the SIM cards produce the more the 91% of the activity. Figure 2, shows the SIM card distribution by the number of active days. Only the 34.59% of the SIM cards have activity on at least 21 different days. There were 241,824 SIM cards (11.72%), that have appearance at least two days, but the difference between the first and the last activity is not more the seven days. This may indicate the presence of tourists. High tourism is usual during this part of the year.
The obtained data was in a 'wide' format, and contained a SIM ID, a timestamp, cell ID, the base station (site) coordinates in WGS 84 projection, the subscriber (age, sex) and subscription details (consumer/business and prepaid/postpaid) and the Type Allocation Code (TAC) of the device. The TAC is the first 8 digits of the International Mobile Equipment Identity (IMEI) number, allocated by the GSM Association and uniquely identifies the mobile phone model.
The Type Allocation Codes are provided for every record, because a subscriber can change their device at any time. Naturally, most of the subscribers (95.71%) use only one device during the whole observation period, but there are some subscribers, maybe mobile phone repair shops, who use multiple devices (see Figure 4a). As a part of the data cleaning, the wide format has been normalized. The CDR table contains only the SIM ID, the timestamp and the cell ID. A table is formed from the subscriber and the subscription details, and another table to track the device changes of the subscriber.
While the subscription details are available for every SIM cards, the subscriber information is missing in slightly more than 40% of the cases, presumably because of the subscribers' preferences of personal data usability. Figure 4b, shows  the age distribution of the subscribers, whose data is available (58.65%), in respect of the subscription type. Note that, this may not represent the age distribution of the population, not even the customers of Vodafone Hungary, as one is allowed to have multiple subscription and the actual user of the phone may differ from the owner of the subscription. Nevertheless, it is still clear that among the elderly people, the prepaid subscriptions are more popular. Figure 3, shows number of daily activity records during the second half of the month. Weekends (brown bars) show significantly fewer activity, hence the activity during the matches compared to the weekday or weekend activity average, respectively to the day of the match.
Although the data contains cell IDs, only the base station locations are known, where the cell antennas are located. As a base station usually serve multiple cells, these cells has been merged by the serving base stations. After the merge, 665 locations (sites) remained with known geographic locations. To estimate the covered area of these sites, the Voronoi Tessellation, has been performed on the locations. This is a common practice [31,6,44,4,29,42] for CDR processing.

Resolving Type Allocation Codes
The socioeconomic status SES of the members in the celebrating crowd have been intended to characterize by the mobile device they use. The preliminary assumption was that the price of the mobile phone represents the SES of a person.
According to our knowledge, there is no publicly available TAC database to resolve the TACs to manufacturer and model, although some vendors (e.g., Apple, Nokia) publishes the TACs of their products. The exact model of the phone is required to know how recent and expensive a mobile phone is. Although this is not even enough to determine how much the cell phone costed for the subscriber as they could have bought it on sale or discount via the operator in exchange for signing an x-year contract. Still, the consumer price should designate the order of magnitude of the phone price.
The dataset of TACs provided by "51Degrees" has been used, representing the model information with three columns: 'HardwareVendor', 'HardwareFamily' and 'HardwareModel'. The company mostly deals with smartphones that can browse the web, so feature phones and other GSM-capable devices are usually not covered by the data set. Release date and inflated price columns are also included, but these are usually not known, making the data unsuitable to use on its own.
Although it cannot be separated by type, but the CDR data contains not only call and text message records, but data transfer as well. Furthermore, some SIM cards do not operate in phones, but in other -often immobile -devices like a 3G router or a modem. 51Degrees managed to annotate several TACs as modem or other not phone devices. This was extended by manual search on the most frequent TACs. There were 324,793 SIM cards that uses only one device during the observation period and operates in a non-phone device.

Fusing Databases
For a more extensive mobile phone price database, a scarped GSMArena database [39] has been used. GSMArena 2 has a large and respectable database, that is also used in other studies [37,49]. The concatenation of the brand and model fields of the GSMArena database could serve as an identifier for the database fusion. 51Degrees stores the hardware vendor, family and model, where hardware family is often contains a marketing name (e.g., [Apple, iPhone 7, A1778]). As these fields are not always properly distinguished, the concatenation of the three fields may contain duplications (e.g., [Microsoft, Nokia Lumia 820, Lumia 820]). So, for the 51Degrees records, three identifiers are built using the concatenation of fields (i) vendor + family, (ii) vendor + model and (iii) vendor + family + model, and all the three versions are matched against the GSMArena records.
Another step of the data cleaning is to correct the name changes. For example, BlackBerries were manufactured by RIM (e.g., [RIM, BlackBerry Bold 9700, RCM71UW]), but later, the company name was changed to BlackBerry and the database records are not always consistent in this matter. The same situation occurs due to the Nokia acquisition by Microsoft. To match these composite identifiers, the simple string equality cannot be used, due to writing distinction, so Fuzzy String match is applied using the FuzzyWuzzy Python package, that uses Levenshtein Distance to calculate the differences between strings. This method is applied for all the three identifiers from the 51Degrees data set and the duplicated matches (e.g., when the family and the model is the same) were removed. Mapping the GSMArena database to the 51Degrees adds phone price and release date information to the TACs, that can merged with the CDRs.
From the GSMArena data, two indicators have been extracted: (i) price of the phone (in EUR), and (ii) the relative age of the phone (in months). The phone price was left intact without taking into consideration the depreciation, and the relative age of the phone is calculated as the difference of the date of the CDR data set (June 2016) and the release date of the phone.

Methodology
The framework, introduced in our earlier work [34], has been applied to process the mobile phone network data. The CDRs are normalized, cleaned and the mobility metrics (Section 3.1) are determined for every subscriber. The records can be filtered spatially and temporally, both of these filtering is applied for this work. Additionally, a group of SIM cards can be selected from the activity records.
Only temporal filtering is applied to visualize the activity trends during the football matches. Figures 9,10,11,12,14 and 17), illustrate the activity of the subscribers in the whole observation area during the matches, including the two hours before and after the matches. For the celebration after the Hungary vs. Portugal match, spatial and temporal filtering is applied to select the area of interest (Budapest downtown) in the given time interval.
To determine the activity levels for the map, Figure 13, the match-day activity, the average weekdays activity (without the match-day) and the Z-scores 3 are determined for the sites of the area of interest (downtown), in the selected time interval (20:15-20:20). We observed that the standard deviation would be higher, without removing the targetday activity from the reference average, consequently the Z-score would be lower and the relative differences less consistent. The histogram of the Z-score were generated for the selected sites ( Figure 5) to determine the activity categories. Zero value means that, the activity level equals to the average, but a wider interval (between −2 and 2) is considered average to allow some variation. Sites with Z-score between 2 and 8 are considered having high activity during the given time interval. There are sites with either low (below −2) or very high activity (over 8). The same method is applied for the map of Figure 18b, but as the area of interest and the event differs, the thresholds are not the same (see Section 4.5).
The groups of football fans are formed from the subscribers based on only the activity during the Hungary vs. Portugal match. The owner of those non-phone SIM cards, that were active after at least two goals are considered active football fans. The properties of these subscribers, including the age, mobility metrics, phone age and price are compared to the rest of the subscribers (Figure 15).

Mobility Metrics
The metrics of Radius of Gyration and Entropy has been used to characterize human mobility. These indicators are determined for every subscriber, omitting those SIM cards, that operate in non-phone devices. In this study, locations are represented by the base stations.
The Radius of Gyration [15] is the radius of a circle, where an individual (represented by a SIM card) can usually be found. It was originally defined in Equation (1), where L is the set of locations visited by the individual, r cm is the center of mass of these locations, n i is the number of visits at the i-th location.
The entropy characterizes the diversity of the visited locations of an individual's movements, defined as Equation (2), where L is the set of locations visited by an individual, l represents a single location, p(l) is the probability of an individual being active at a location l and N is the total number of activities of an individual [31,5].

Socioeconomic Status
In our earlier work [34], the real estate price of the subscribers' home locations were used to describe the socioeconomic status. In this study, the CDRs are enriched by phone prices and the phone price is assumed to apply as a socioeconomic indicator. To demonstrate the applicability of the mobile phone price as a socioeconomic indicator, it was examined in respect of the mobility indicators, applying Principal Component Analysis (PCA).
The SIM cards are aggregated by the subscriber age categories (5-year steps between 20 and 80) and the phone price categories (100 EUR steps to 700 EUR), the Radius of Gyration and Entropy categories. For the Radius of Gyration, 0.5 km distance ranges are used between 0.5 and 20 km, and the Entropy values are divided into twelve bins between 0.05 and 1.00. The structure of the data used for the Principal Component Analysis defined as follows.
A table has been generated where, every row consists of 40 columns, representing 40 Radius of Gyration bins between 0.5 and 20 km and 20 columns representing 20 Entropy bins, between 0.05 and 1.00. The subscribers, belonging to each bin are counted, and the cardinality have been normalized by metrics to be able to compare them. The categories are not explicitly labeled by them, so the subscriber age and the phone price descriptor columns are not provided to the PCA algorithm. The same table is constructed using weekend/holiday metrics and its rows are appended after the weekdays ones.
When the PCA is applied, the 60-dimension vector is reduced to two dimensions based on the mobility customs, where the bins are weighted by the number of subscribers. The cumulative variance of the two best components is about 61% (see Figure 7b). The bins, representing the two new dimensions (PC1 and PC2) are plotted (see Figure 6) and the markers are colored by the phone price, marker sizes indicate the subscriber age category, using larger markers for younger subscribers.

Results and Discussion
As Figure 6 shows, the markers are clustered by color, in other words, the phone price, that is proportional to PC1, but inversely proportional to PC2. Within each phone price group, the younger subscribers (larger markers) are closer to the origin, indicating that the mobility custom of the younger subscribers differs from the elders, although this difference is smaller within the higher price categories. This finding coincides with [9], where Fernando et al. found correlation between subscribers' age and mobility metrics.
To give context to Figure 6, Figure 7a, shows the phone price distribution: most of the phones are within the 50-200 EUR range. Note that, there are only a few phones over 550 EUR, but the owners of those have significantly different mobility patterns. Figure 6 does not only show that the phone price forms clusters, but also reveals the effect of the subscription type to the mobility. Within the phone price categories, except the highest with only a very few subscribers, the postpaid groups are usually closer to the origin. Prepaid subscriptions are usually for those, who do not use their mobile phone extensively, and it seems that people with a prepaid subscription have similar mobility customs as people with less expensive phones but postpaid subscription. That is most notable at (-6, 2) and (-5, -1).    Sultan et al. identified areas in Jhelum, Pakistan, where more expensive phones appear more often [40]. Using the same method, Budapest and its agglomeration was evaluated: the average phone prices from the activity records are determined for every site. The ground truth is that the real estate prices are higher on Buda side (West of river Danube) of Budapest and downtown [34], and this tendency can be clearly seen in Figure 8. The airport area has a significantly higher average than its surroundings, that is not surprising. The spatial tendencies of the mobile phone price, along with the result of the PCA (Figure 6

Austria vs. Hungary
The first match against Austria (Figure 9) was started at 18:00, on Tuesday, June 14, 2016. Before the match, the activity level was significantly higher than the average of the weekdays, and later decreased until the half-time. During the second half, the activity level dropped to the average, which indicated that more people started to follow the match. Right after the Hungarian goals, there are two significant peaks have been observed in the activity, which exactly indicates increased attention and the massive usage of mobile devices during the match.
As the data source cannot distinguish the mobile phone activities by type, it cannot be examined what kind of activities caused the peaks. It is supposed that the activity was mostly data transfer or text messages, not phone calls. It simply does not seem to be lifelike to call someone during the match just because of a goal, but sending a line of text via one of the popular instant messaging services, is very feasible.

Iceland vs. Hungary
The match against Iceland was played on Saturday, June 18, 2016. Figure 10, shows the mobile phone activity levels before, during and after the match. As the weekend activity is generally lower (see Figure 3), the average of the weekdays are used as a reference. The match began at 18:00, and from that point, the activity level was significantly below the average, except the half-time break and, again, the peak after the Hungarian goal. Interestingly, the Icelandic goal does not result such a significant peak, only a very moderate one can be seen in the time series.
Traag et al. [41] also found activity drop during a game, but in that case the area of the stadium was analyzed, where the match was played and there was no peak during the match.

Hungary vs. Portugal
On Wednesday, June 22, 2016, as the third match of the group state of the 2016 UEFA European Football Championship, Hungary played draw with Portugal. Both teams scored three goals and with this result, Hungary won their group and qualified for the knockout phase. During the match, the mobile phone activity dropped below the average, but the goals against Portugal resulted significant peaks, especially the first one (see Figure 11). On the other hand, the Portuguese equalizer goal(s) did not cause significant mark in the activity. In the second half, the teams scored four goals in a relatively short time period, but only the Hungarian goals resulted in peaks. This observation suggests that the football fans had notable influence on the mobile network traffic.
After the match, the activity level is over the average, that might represent the spontaneous festival in downtown Budapest. According to the MTI (Hungarian news agency), thousands of people celebrated in the streets, starting from the fan zones, mainly from Erzsébet square (Figure 13  This social event is comparable to mass protests from a mobile phone network perspective. In an earlier work [35], we have analyzed the mobile phone activity at the route of a mass protest. The activity of the cells were significantly high when the protesters passed through the cell. In this case, however, the affected area were smaller and the sites along the Grand Boulevard were very busy at the same time after the game. The activities of the sites (multiple cells aggregated by the base stations), in Budapest downtown, are illustrated on Figure 12. The highlighted site covers mostly Szabadság square (for the location, see Figure 13 a), where one of the main fan zones was set up with a big screen. The activity curve actually follows the trends of the whole data set (see 16 Figure 11). There is high activity before the match, during half-time and, for a short period, after the match. During the match, the activity decreased except four, not so significant, peaks around the goals.
In the highlighted site, in Figure 12, almost 7 thousand SIM cards had been detected between 17:00 and 20:00. The data shows that 53.57% of the subscribers were between 20 and 50 years old, while 33.49%, had no age data.
After the match, there is a significant increase in the activity in some other sites. These sites are (mostly) around the Grand Boulevard, where the fans marched and celebrated the advancement of the national football team to the knockout phase. Figure 13, shows the spatial distribution of this social event, using Voronoi polygons generated around the base stations locations. The polygons are colored by the mobile phone network activity increase at 20:15, compared to average of the weekday activity. For the comparison, the standard score was determined for every base station with a 5-minute temporal aggregation. The darker colors indicate the higher activity surplus in an area. The figure also denotes the three main fan zones in the area, routes of the fans by arrows, and the affected streets are emphasized. Who are responsible for the peaks?
There were three Hungarian goals during the match, hence there were three peaks, starting at 18:18, 19:02 and 19:18.
All of them had about 5-minute fall-times. To answer this question, the SIM cards that were active during any two of the peaks were selected. Selecting SIM cards that were active during any of the peaks, would also include many subscribers, that cannot be considered as football fan. The participation of all the three peaks, on the other hand, would be too restrictive. Figure 14a, presents the activity of the selected 44,646 SIM cards and the owner of these cards, which may belong to the football fans. Removing these SIM cards from the data set, should result an activity curve without peaks, and at the same time similar, in tendency, to the average activity. However, as Figure 14b shows, the activity still drops during the match. Therefore, the 'football fan' category should be divided to 'active' and 'passive' fans, from the mobile phone network perspective. Active fans are assumed to express their joy using the mobile phone network (presumably to access the social media) and cause the peaks. It seems that the passive fans ceased the other activities and watched the game, that caused some lack of activity, compared to the average. By removing the active fans from the observed set of SIM cards, the activity level decreased in general (Figure 14b). However, this is not surprising, as these people reacted to the goals, they must often use the mobile phone network. There are also some negative peaks, indicating that the selection is not perfect.
Is there any difference between the active fans regarding the phone age and price compared to the other subscribers? Figure 15a, shows the relative age of the phones in respect of the subscribers' behavior after the goals. No significant difference has been realized between the active fans and other subscribers, the median of the phone relative age is about two years, and there are some much older (nearly ten years old) phones in use. It should be noted that older devices are used by elderly people. The price of the phones show opposite tendency: the younger subscribers own more expensive phones (Figure 15b).
Naturally, not all of these 169,089 SIM card (without the ones operating non-phone devices) generated activity after all the goals. 83,352 devices were active after the first goal, 70,603 after the second and 68,882 after the third. After at least two goals 44,646, and after all the three goals only 9102 devices had activity, within 5 minutes.

Westend terrace (d)
Activity: Low Average High Very high Activity: Low Average High Very high Why would they use the mobile phone network to access social media? If they were at home, they would have used the wired connection, via Wi-Fi for mobile devices. In Hungary, the 79.2% of the households had wired internet connection, according to the KSH [21], and it could be even higher in Budapest. However, if they were at fan zones, for example in Szabadság Square, using the mobile network is more obvious.
As Figure 15 shows, there is no significant difference in the phone age between the active football fans and the rest of the subscribers. The medians are almost the same within the young adult and the middle-age categories, but elders tend to use older devices, especially those, who did not react to the goals. The active football fans' median phone price is 180 EUR, in contrast of the 160 EUR median of the rest of the subscribers. However, the older subscribers tend to use less expensive phones. This tendency is also present within the football fans, but stronger within the other group.     Figure 16, illustrates the mobility metrics in different age categories, also comparing the football fans and the rest of the subscribers. The Radius of Gyration median is almost the same in all the age categories and groups. The Entropy medians have a notable difference between the two groups, but do not really change between the age categories. This means, that the mobility customs of the football fans, who use the mobile phone network more actively, are similar, regardless of the subscribers' age.

Hungary vs. Belgium
On Sunday, June 26, 2016, Hungary played the fourth and last Euro 2016 match against Belgium. Figure 17, shows the mobile phone network activity before, during and after the match. During the match, the activity level was below the weekend average. The activity after the match was slightly higher than average, since the match ended late on Sunday, when the activity average is usually very low. This activity surplus may only indicate that the fans were simply leaving the fan zones and going home.

Homecoming
The  The cells of this base station cover a larger area, so not all of these subscribers actually attended to the event, but on the other hand, it is not compulsory to use the mobile phones during this event. Supposing that the mobile phone operator preferences among the attendees corresponded to the nationwide trends in 2016, there could even be about 17 thousand people, as the data provider had 25.3% market share [27]. Figure 18b shows, a part of District 6 and the City Park with the Heroes' Square and the Voronoi polygons of the area are colored according to the Z-score values, to indicate the mobile phone activity in the area, at 18:35. The activity is considered low below −1, average between −1 and 1, high between 1 and 2.5 and very high above 2.5. Figure 18a shows, the mobile phone network activity (upper), and the Z-score (bottom) of the site, covering Heroes' Square. It is clear, that during the event, the activity is significantly higher than the weekday average, and the Z-score values are also follows that.

Limitations
We associated subscribers' SES with the release price of their cell phones, however, it is not necessary for them to buy their phones at that price. Many people buy their phone on sale or discount via the operator in exchange for signing an x-year contract.
Also, subscribers can change their phone devices at any time. We have taken into consideration only those subscribers, who had used only one device during the observation period, or had a dominant device that generated most of the activity records of the given subscriber.
We have fused three data sets to exclude the non-phone SIM cards, but the identified devices are not complete. There remained devices, that models are unknown and there are phones, that release date and price are unknown. It is not possible to determine SES of these subscribers with the proposed solution.

Future Work
Although, the current solution to select the football fans' SIM cards, in other words, the SIM cards, that caused the peaks gives a reasonable result, but could be improved by analyzing the activity during the whole observation period. For example, applying a machine learning technique.
Extending the list of the non-phone TACs could also help to refine the results, and combining the mobile phone prices with the real estate prices of the home location would most certainly enhance the socioeconomic characterization.
The relative age of the cell phone might be used as a weight for the phone price, when applied as SES indicator to distinguish between the phone price categories. As an expensive, but older phone is not worth as much as a newer one with the same price.

Conclusions
In this study, we demonstrated that mobile phone network activity shadows precisely the football fans' behavior, even if the matches are played in another country. This analysis focused the people followed the matches on TV (at home) or big screens at the fan zones, but not in the stadium, where the matches were actually played. The mobile phone network data and the mobile phone specification database has been applied to characterize the SES of the football fans. The data fusion allowed us to remove a considerable number of SIM cards from the examination that certainly operates in other devices than mobile phones. Although, there are some still unidentified TACs in the data set, but this way, the activity records, involved in this study, have a significantly higher possibility to used by an actual person during the events.
The time series of mobile network traffic clearly show that the activity was below the average during the matches, indicating that many people followed their team. This observation coincides with other studies [41,24,46,16], where the activity of the cells at the stadium were analyzed. We also demonstrated that a remote football match can also have notable effect on the mobile phone network. Moreover, the joy felt after the Hungarian goals, is clearly manifested in the data, as sudden activity peaks. The CDR data is certainly capable of social sensing.
The spontaneous festival after the Hungary vs. Portugal match and the welcoming event at the Heroes' Square are direct applications of social sensing and comparable to mass protests from a data perspective. During the events, the mobile phone network activity was significantly higher than the average in affected areas.
The price of the mobile phone was proved to be an expressive socioeconomic indicator. It is capable not only to cluster the areas of a city, but also to distinguish the subscribers by mobility customs. On the other hand, it does not seem to affect the interest in football.