A Network-Based Analysis of a Worksite Canteen Dataset

The provision of wellness in workplaces gained interest in recent decades. A factor that contributes significantly to workers’ health is their diet, especially when provided by canteen services. The assessment of such a service involves questions as food cost, its sustainability, quality, nutritional facts and variety, as well as employees’ health and disease prevention, productivity increase, economic convenience vs. eating satisfaction when using canteen services. Even if food habits have already been studied using traditional statistical approaches, here we adopt an approach based on Network Science that allows us to deeply study, for instance, the interconnections among people, company and meals and that can be easily used for further analysis. In particular, this work concerns a multi-company dataset of workers and dishes they chose at a canteen worksite. We study eating habits and health consequences, also considering the presence of different companies and the corresponding contact network among workers. The macro-nutrient content and caloric values assessment is carried out both for dishes and for employees, in order to establish when food is balanced and healthy. Moreover, network analysis lets us discover hidden correlations among people and the environment, as communities that cannot be usually inferred with traditional or methods since they are not known a priori. Finally, we represent the dataset as a tripartite network to investigate relationships between companies, people, and dishes. In particular, the so-called network projections can be extracted, each one being a network among specific kind of nodes; further community analysis tools will provide hidden information about people and their food habits. In summary, the contribution of the paper is twofold: it provides a study of a real dataset spanning over several years that gives a new interesting point of view on food habits and healthcare, and it also proposes a new approach based on Network Science. Results prove that this kind of analysis can provide significant information that complements other traditional methodologies.


Introduction
The impact of the diet adopted in canteens is highly relevant for peoples' health [1,2], helping in diseases prevention [3], improving productivity [4], and providing a contribution to a global healthy population [5]. In this work, we focus on worksite canteens but some of the results and the methodology can be applied to several others, such as school canteens [6,7] and in [8] that performs analysis of both environments.
The common characteristic is the presence of complex correlation among meals, as well as people and habits that usually arise in creation of communities. The analysis and improvement of food quality in workplaces is a challenging task for several reasons: • a shifting towards lower quality food has been detected in nutritional surveys, e.g., [9], sometimes to guarantee profit margin for canteen operators [10]; • many people spend a relevant amount of their life at work (approximately more than 60 percent of waking hours), and about one-third of our daily energy intake is consumed in worksites [11], therefore the quality of meals, both in terms of health and pleasure, is significant along the day [12,13] and the promotion of good eating habits further improves such a scenario [14]; • workers usually have limited access to nutrition information, and this prevents them from making an aware choice [15]; • a trade-off between high-quality food intake in worksites and the reduction of related costs and waste must be achieved [16], e.g., by exploiting solutions as pre-ordering of meals, weight-based billing, and flexible portion size [17,18].
To cope with all these issues, an analysis of actual dishes intake is helpful; this can be accomplished, for instance, by evaluating the compliance of canteen menus with well-known healthy diets as the Mediterranean diet [19], as well as considering dishes composition in terms of macro-nutrients as lipids, carbohydrates and proteins [20].
In this paper, a multi-company dataset of workers and their dishes intake within a canteen worksite is considered. Therefore, while other studies (e.g., [21]) operate on a statistical basis of semi-automatically collected data (for instance via questionnaires), our dataset comes from users' pre-ordering menu selection made via a dedicated App. Moreover, we adopt a Network Science approach that has been successfully used to represent and analyze data in disparate fields as economy [22,23], biology [24], migration [25] and only recently in food-related contexts [26].
The results of our analysis is concerned with with the topological analysis of tripartite network including workers, the company they belong to, and consumed dishes. Then, the macro-nutrient content and caloric values of both dishes and the average meal of each employee are considered to assess to what extent served dishes are balanced and healthy, to promote a healthful lifestyle. Finally, we also investigate the community structure of the canteen network.
In summary, the main contributions of this case study paper are: • the study of the eating habits at a worksite canteen, and their impact on health that take into account also the correlation among the contact network existing among people due to the presence of different companies; • the use of network analysis techniques that allow us to search for hidden correlations among people and their environment, for instance, the presence of communities (not only due to company belonging) that cannot be studied or inferred with traditional or statistical methods since they are not known in advance; • to provide a different view of the dataset, represented as a tripartite network.
As for the last point, we believe that a tripartite network representation of our dataset clearly highlights relations among companies, people and dishes. Moreover, from this representation, we can further elaborate the so-called network "projections", where the network among specific kind of nodes (e.g., people or dishes) is inferred. This allows performing community analysis using standard network community detection algorithms and well-developed network analysis tools.
Existing literature is considered in Section 2, whereas the dataset is introduced in Section 3, and in Section 4 network analysis is described in detail, and results are discussed. We finally consider further works and concluding remarks in Section 5.

Related Works
During the last decade, the political and scientific awareness about food-related issues quickly increased. In the literature, several studies exist that analyze the impact of good and healthy food mainly in medicine and related research areas.
According to Czarniecka-Skubina et al. [27], "there are very few publications referring to staff canteens, especially canteens located in office buildings". Some works focus on the impact of some interventions on workers habits aiming at improving dietary habits and/or physical activity, among them Bandoni et al. [28] evaluated the impact of an educational and environmental intervention on the availability and consumption of fruit and vegetables in workplace cafeterias whereas Steyn et al. [29] reviewed several workplace to evaluate the impact of such interventions. In [8] the analysis of food-handlers was performed on a sampling of 169 large canteens (50 in schools and 119 in factories with a total of 3399 employees) in Southern Vietnam; all food-handlers in the selected canteens were interviewed by using standard questionnaire. Besides, Thorsen et al. [30] aimed to evaluate the availability of healthy meal options at Danish worksite by using a self-administered questionnaire mailed more than 1900 worksite canteen managers. The quality of communication and the impact on healthy food was studied in [31] that explored criteria that motivate peoples' food choice in a workplace food service setting. Data were collected using questionnaires distributed to four focus groups between Germany and the UK. A different focus, similar to this work, was performed by Lassen et al. [21] that examined the nutritional quality of lunch meals eaten at 15 worksite canteens compared with results from a study conducted 10 years before, and they found, for instance, that mean energy intake was 2.1 MJ/meal and estimated energy density was 599 kJ/100g.
Lytle et al. in [32], present a review of more than 400 studies on measures of the food environment and concluded that the most common methodology, at the time of publication, was geographic analysis (about 65% of articles). Moreover, fewer than 30% of the works do not account for reliability of measures and/or validity. They also note that results can be affected by the quality of the responses, considering that most data is collected via questionnaires, and very few studies report measures about school or worksite environments, mainly due to difficulties in collecting and processing this kind of data. A recent work [6] surveyed about 18,350 relevant abstracts, with a total of 38 articles included in the study that aimed at presenting the methods used to assess the school food environment. They claim that the most common method to obtain information used to measure the school food environment (35 over 38 cases) was a self-administered questionnaire/survey and highlight that main disadvantages with this method include self-reported bias in favor of desirable rather than actual practice and that the survey results may not capture realistic practices. Among the works that used quantitative methods only 5 included a menu analysis.
Our work is a quantitative analysis of workers food habits using raw data collected by a booking App enriched with contextual information to avoid questionnaires.
The idea of leveraging an automatic system to collect employees food choices in a canteen is presented in [33] where the authors though analyzed only a limited dataset. Our work falls into this context and shows how such a system is able to constantly monitoring food intakes.
Several works focus on healthy food consumption also consider gender differences, for instance in [7] the authors examined 18,070 trays of food from seven schools in New Orleans, finding that the students consumed an average of 307 Cal during lunch. In [34] authors presented a comparison of the nutritional quality of 2192 Korean workers' lunches served by institutional or commercial food services; they reported that Nutrient Adequacy Ratio and Mean Adequacy Ratio were significantly higher in the institutional lunches than in the commercial lunches; however they also noted that more than half of workers in both groups obtained over 65% of their energy from carbohydrate. Also, in [35] authors studied the changes of nutrient content of food offered at worksite canteens over time, finding only marginal changes in caloric and macro-nutrients composition of the menus.
The above cited works studied the lunch consumption using a quantitative methodology that measure the nutrients but none use the network analysis approach to study the mutual influence among companies workers belong to, the existence of hidden community and the assessment of canteen proposal. Moreover, we used raw data thus avoiding biases due to questionnaires.
Another interesting analysis of consumers behavior concerning food intake in canteens was conducted in [33], where they attempted to establish how much a pre-ordering system is appreciated. In workplace canteens this is indeed largely regarded as acceptable and this confirms that data collected by this type of applications, such as that used by the canteen analyzed in this paper, are a correct snapshot of real habits of involved people.
In [36], authors investigated the effectiveness and long-term impact on the composition of the habitual diet based on the promotion of healthy food-choices resembling the traditional Mediterranean diet at an Italian worksite canteen. They found that a significantly higher number of dishes based on wholegrain cereals, legumes, white meat and fish, and a lower choice of dishes based on refined cereals, red and processed meat, eggs and cheese over 42 months led to a significant improvement in agreeing with nutritional recommendations. These works, whose aim are to evaluate the change of food habits induced by external stimulus, collect data using questionnaires, while we perform our analysis on raw data reducing the time need to achieve significant results when modifications of habits occur.
The importance of correct eating habits is universally accepted, but it is quite difficult to correctly measure the actual quality of the meals, mainly in worksite canteens where time and economic problems are often overwhelming. An interesting report on the state of health in Italy correlated with eating habits is presented in [37].
Recently, the environmental impact of food habits was studied focusing on the strategy to reduce waste and to achieve a sustainable diet. To estimate the nutritional quality of foods, most researchers try to provide users some food recommendations. For example, Crockett et al. [38] studied the effect of nutritional food labels that communicate the nutritional quality using a traffic light model and a star rating, while González-García et al. [39] analyzed 21 peer-reviewed studies to compare 66 dietary scenarios and proposed a method to classify meals from a environmental and nutritional point of views; they found limitations related to different system boundaries and with underlying uncertainties related to data sources. Although our work does not concern with environmental matters, it provides anyway a general approach to analyze the correlation among different contexts, e.g., working companies and dietary, that could be used also to study the complex correlation network of the supply chains.
As mentioned in the introduction, our main contribution is the adoption of a networkbased approach that aims at finding the existing correlation between people and food habits in the specific context of worksite canteens. Besides, thanks to the automatic data collection performed by an App, we do not need questionnaires or other customer explicit querying mechanisms.
Besides, we did not use any of the datasets discussed above since eating habits are strongly influenced by culture and nationality. Instead, we aim at using data usually collected by canteen managers for administrative purposes that of course do not focus on the specific problem under investigation (as healthy habits). The use of raw data avoids any bias and provides a chance to integrate the results of the analysis into the application itself to make a proactive improvement of eating habits.
Finally, to the best of our knowledge, no tripartite network model has been used elsewhere to analyze food datasets, even if all analysis tends to answer quite similar questions in different cultural contexts.

Dataset Description and Representation
The dataset analyzed in this work includes 49,539 dishes ordered in about 2.5 years, from August 2017 to March 2020, by 646 employees of multiple companies. The companies differ both in number and the average age of employees and their core business; however, their business is related to Information and Communications technology. The companies belong to an innovation hub, which provides several other services with the main aim being the improvement of the quality of life of the employees [40] to increase their productivity through a better sense of belonging. The hub is located in Sicily, Italy and the employees are from different Italian regions and even different countries. In particular, the canteen is accessible by the employees of 22 different companies through a cross-platform proprietary App, which allows them to book their meal by choosing from the daily menu and provides the canteen management with statistical information. Each company employs from 1 to 264 employees and the average number of meals consumed by workers of each company is extremely variable (from 1 to more than 262). For a deeper view on the number of employees of the companies and the number of meals consumed, we refer the reader to Table 1. For this study, the dataset was anonymized by removing personal information about employees and identifying each of them with a randomly chosen "employee code"; however information about the company id they work for was retained.
Each dataset record is created by the App when an employee makes a reservation and includes the timestamp, the anonymous employee code and the dish chosen. We pruned all the records where some of the data is missing, discarding about 20% of them.
Regarding the dishes served during the collection of the data, there are 317 dishes in the dataset and they are identified by name and their type. Dishes have a different number of occurrences in the menu (i.e., they are not proposed periodically), and for each dish additional information is also available (for instance, if the dish comes with bread, if it is gluten-free, suitable for vegetarians, etc.).
Dishes were grouped by canteen management into 5 different categories according to the typical structure of a menu in Italy [41], as shown in Figure 1 where it is possible to note that the multiplicity of proposals in the different categories is not homogeneous and two of them are way larger than others. In Figure 2 we represent the dataset using a tripartite network that allows us to connect companies (C) (each with a different color), employees (P), and meals (D). A tripartite network is a graph whose vertices can be divided into three disjoint and independent sets C, P and D, such that every edge connects a vertex in C to one in P or a vertex in P to one in D. Vertex sets C, P and D are usually called the modes of the network. Each line (or edge) in the figure connects the employee (center column) with the chosen dish (right column) and the company he/she belongs to (left column). Given the characteristics of our dataset, the tripartite network representation appears the most appropriate. In fact, it clearly highlights relations among companies, people and courses and allows us to perform dataset analysis using standard and well-developed network analysis tools. The canteen's menu has several options, some of which are repeated several times while others are offered only once or very rarely. The menus contain several choices, some of which may require that the diner also choose to get bread. As customary in the Italian culinary tradition, we included a 100 g portion of bread for categories 2, 3 and 5. In Table 2 a typical menu proposed by the canteen is shown.
Additional information regarding the ingredients of dishes is also available, along with the caloric value and macro-nutrients (as carbohydrates, proteins, and lipids); these values come from both classical Italian cooking recipes and the average portion size, and they play a relevant role in the present study to discuss the nutritional quality of the meals. No information is present in the dataset regarding extra foods or courses (such as fruit, desserts, or drinks) that can be consumed by the employee during lunch. Table 2. Some example of the dishes and the category they belong to. We reported the menu item in the original language and its translation.

Category
Original Name (Italian) English Name During the data collection period, 317 different served dishes exhibit a grand total of about 146 different ingredients. However, most courses are very similar, and they slightly differ in the recipe, meaning that ingredients and macro-nutrients are quite the same. Moreover, each course usually includes up to seven ingredients as depicted in Figure 3 while few courses include more than eight ingredients.
The single most common ingredient is extra virgin olive oil used for the preparation of 83% of the dishes. In Table 3 the average, minimum and maximum values of caloric values and macronutrients in the dishes are summarized. Figure 4 reports the distribution of proteins, lipids, carbohydrates, and calories over the proposed dishes for each category, the presence of many outliers for category 2 and 3 indicates that the dishes present in these categories are very varied in their composition.

Dataset Analysis
In this section, we analyze the dataset from several perspectives. In the first step we study the dataset topology in order to evaluate the people-vs-dishes relation, while in the second step we study the macro-nutrients and the caloric value of both dishes and the diet of the employees. Finally, we analyze which communities emerge from the data.

Topological Analysis
The first step in our analysis concerns the network topology. We study the degree and the strength of each node in the second and third modes of our tripartite network, which represent people who had a meal in the canteen and the dishes served. From now on, we will call these two modes People and Dishes, respectively.
The degree of nodes in the People mode is the number of unique different dishes consumed by that person: the higher the degree, the larger the number of dishes tried. Vice versa, the degree of nodes in the Dishes mode is the number of different people that ordered that dish at least once. More than one person in four sticks with the same small selection of dishes while the number of people that try more dishes decreases with the number of dishes itself, as shown in Figure 5a which illustrates the degree distribution of the People mode. On the other hand, most dishes have been chosen by very few people, as illustrated by the degree distribution of the Dishes mode in Figure 6a, and very few are very popular choices.   Unlike the degree, the strength of a node accounts not only for the number of unique different choices made but also for the number of times they have been made, i.e., the strength of a node in the People mode is the number of times that person had a meal in the canteen, while the one of a node in the Dishes mode is the number of times that meal has been ordered. As depicted in Figure 5b, almost 10% of people rarely order a meal, while a small number have a long order history. This should not be surprising as the dataset includes guests, people that travel for work and so on. More interesting is the strength distribution of the dishes, depicted in Figure 6b, which shows that most dishes are ordered not only by few people but just very few times, as opposed to very few dishes that are chosen often. Another key fact is that all these distributions follow a power-law, which is typical, for instance, in real-world social networks.
In Table 4 we report the 10 most popular dishes along with the number of times they have been ordered. Let us note that they include meat dishes with very few exceptions.
At this point, the following question arises. Do people choose a wide variety of food? We try to answer with Figure 7, which shows how many people choose different food categories. As evident, most people ordered one dish from at least three of the categories described in the previous section (Figure 1), and just a few (which may include guests), only stick with one category.

Macro-Nutrient Analysis
After analyzing the topology of the network, we analyze the macro-nutrient content and caloric values of both the dishes and the average meal of each employee. We show the macro-nutrients' and overall caloric value of dishes in Figure 8.
To get a better understanding of their healthiness, we divide the dishes into balanced and unbalanced according to the [42,43] guidelines, which set the distribution of daily calories intake from carbohydrates between 45% and 65%, from proteins between 10% and 35%, and from fats between 20% and 35%. As shown in Figure 9a, just 14.5% of the dishes are balanced overall macro-nutrients, while most are unbalanced in carbohydrates (68.1%) and in fats (76.6%). We analyze further the content of macros of the unbalanced dishes and show the distribution of their lack or abundance in Figure 10. Specifically, we first compute the caloric value ratio of each macro, and then compute the distance from the guideline's range, i.e., we subtract the lower bound if the caloric value ratio is lower than the minimum suggested, or the upper bound if it is greater than the maximum suggested. As illustrated in the figure, the caloric value of most dishes comes from an excess of fats at the expense of carbohydrates.  Figure 9. Division of the dishes (a) and diet (b) of employees in balanced (True) and unbalanced (False) according to [42,43] guidelines. We also perform a similar analysis for the average diet of the employees. In particular, we first show the distribution of the macro-nutrients' and overall caloric value of their diet in Figure 11. Then, we compute how many have an unbalanced diet and show the result in Figure 9b, while most of them (97.6%) get the correct caloric value from proteins, the large part (95.5%) gets, on average, unbalanced dishes from the canteen. Again, as illustrated in Figure 12 that depicts the unbalance distribution of macros, the unbalance comes from an abundance of fats at the expense of carbohydrates.

Understanding the Community Structure of the Canteen Network
We begin this analysis by extracting the bipartite network of the people and the dishes chosen (i.e., People -Dishes modes) from our tripartite dataset. The next step is to build People and Dishes projections of the bipartite networks. In particular, the People network projection is a network where nodes are people and a weighted link between a couple of people represents the number of dishes they have in common. On the other hand, the Dishes network projection is a weighted network of dishes where a link between a couple of dishes takes into account the number of people that have chosen both dishes. We aim at studying the community (or group/cluster) structure of both projections to gain a deeper understanding of the dataset under analysis. A community can be informally defined as a set of densely connected nodes/vertices of the network ( [44]). For a more detailed introduction to the topic of community in networks, we refer the reader to [45].
To uncover the community structure of our networks, we employ the Louvain algorithm [46,47] variant implemented in Pajek [48]. Basically, this algorithm performs a greedy optimization of the modularity function [49], a measure of the quality of a network partition into communities. Figure 13 reports the communities of people we found with this algorithm. In particular, there are three communities of people, which we indicate by using a numeric identifier in the right side of Figure 13. On the left side of the same figure, we also report the Company (once again identified by a numerical label) people belong to. We can notice that people tend to form cross-company groups. For example, employees of the company with id = 0 are distributed across the three communities showing that there is not a single common pattern among people of company 0, at least for what concerns their diet. Curiously, employees of the company 19 seem to belong to only two of the three communities, exhibiting a slightly different behavior with respect to the employee of the other companies (this deserves further investigations in future works). In Figure 14 the community structure of the dishes network projection is shown. Also, in this case, we found three communities more or less of the same size. To better understand how dishes are distributed in each community, all dishes are classified in five categories (see Figure 1) and further analysis is performed, whose result is reported in Figure 15. In this figure, communities are analyzed in terms of dish type. It is evident that we can observe a different pattern in each community. For example, in the community 1 "First Course" is the prevalent kind of dishes, while in the community 2 it is "Main Course". Moreover, the majority of "Cold Cuts" dishes are concentrated in the community 3 which suggests some sort of similarity between them and the other dishes in the same community that is stronger than in the other two communities. Please note that such a kind of similarity is the result of the pattern the employees of the different companies follow in choosing the dishes to eat. It is a sort of "similarity" induced by the people's choices and not by the similarity among dishes' ingredients.

Conclusions
In this work, we presented and discussed a dataset of a multi-company canteen service. We illustrated its main features and relevant results emerged from a first analysis using network-based approach. We believe that several cues come from this analysis and will be considered as future works: • to leverage personal information about people eating at the canteen (in addition to those used here), such as sex, age, preferences, medical-health, socio-economic and others, in order to perform more comprehensive analysis; • similarly, to exploit detailed nutritional facts about food provided would also enrich the dataset and the knowledge we can extract from it; • a temporal analysis would allow predicting users behaviors, assisting in canteen planning and management as well as to establish more sustainable food practices [50]; • the use of machine learning techniques will endorse food recommender systems, for instance for advancing a healthy behavior programme [51]; • to use a system that collects information about people movement causing crowd [52,53] at the canteen, integrated with data already present, that could influence habits, for instance reducing available time could push easy to take meals.
Author Contributions: The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been partially supported by the project of University of Catania PIACERI, PIAno di inCEntivi per la Ricerca di Ateneo.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request (by e-mail) from the corresponding author. The data are not publicly available due to corporate internal guidelines.