1. Summary
The sense of taste is already formed in early childhood and therefore, the food we grow up with shapes us for the rest of our lives. Therefore, national dishes or food we eat defines us not only as individuals but also as societies leading to substantially differences in consumer tastes and preferences across countries. Taste in food is argued as being persistent [
1], e.g., analyze how food preferences are based on associations with the context and consequences of eating various foods. The evolution of food preferences seem to be primarily determined by past consumption of particular foods which are locally available (see [
2,
3]).
In order to measure food tastes at the country level a novel data set on national dishes and their ingredients for 171 countries was compiled. Further, our data set contains information on migration and bilateral food trade flows for five years. Importer and exporter GDP, population size, distance between the most populated importer and exporter cities, as well as information on Preferential Trade Agreements (PTA) between importer and exporter, contiguity, language and colonial information complete our data set. Besides, the data set is supplemented by two dyadic food taste similarity measures using (i) the Manhattan distance and (ii) latent semantic analysis (LSA).
Although [
3] (for rice) and [
4] (for cars) has shown the importance of taste by using product-specific attributes, so far, only few literature tries to quantify taste in trade. Most papers in the literature are limited to specific products, like, e.g., [
5] for French champagne or [
6] for wine. Overall, only for relatively few food products, an external measure of preferences or taste exist.
The relationship between our two data set is as follows: With help of the data set, it is possible to analyze the relationship between preferences and trade, as differences in tastes can, e.g., shape international agro-food trade [
7] or affect consumer quality valuation of imported goods [
8].
Furthermore, a database that measures differences in tastes and is time-invariant can also be used to analyze the link between taste and markups for different sectors like, e.g., for the food processing sector in Italy as done by [
9] with help of our data set. They use a data set of all Italian exporters of cheese and processed meat over the period from 2013 to 2019 to understand the pricing strategy of exporters across international markets. Hereby, our data set helped to get a better understanding, that export prices across markets differ due to taste conditional on quality. Further, [
10] used our data set to show that consumer taste explains as much of the variation in export revenue as marginal costs.
This paper aims at providing a description of our whole data set, including information on two dyadic food taste similarity measures and explains the data retrieval and processing.
2. Data Description
The data set contains information on migration and bilateral food trade flows. In order to measure food tastes at the country level a novel data set on national dishes and their ingredients for 171 countries was compiled. Therefore, the data set is composed of two data sets that are ready to read in CSV.
The first file (
national_dishes_ingredients.csv) shows information of the ingredients in all national dishes of 171 countries and their names as well as a short description of the dish (e.g., soup, stew, etc.). Data on national dishes was compiled by the authors. We gathered ingredients for 350 different dishes with the exact name of the dish. For each country the national dishes were used. If a country had more than one common national dish we searched for the most popular dish and denominated this as “the” national dish. In the end, we therefore had 171 national dishes. Overall, there are 218 ingredients. Each line in the data set presents information for a single recipe in columnar form.
Table 1 shows all the variables included in this csv.-file.
The second file (
gravity_ food_tastes_similarity.csv) contains panel information for 5 years (1998, 2000, 2005, 2010 and 2015) for the 171 countries. For each country, the bilateral agro-food trade flows in current million USD are recorded, defined as HS chapters 1–24, and based on the UN Comtrade Database. Further, we have data on importer and exporter GDP (in current USD) and population (in millions) both in logs, log distance between the most populated cities, as well as dummy variables for Preferential Trade Agreement (PTA) in force, contiguity, common official primary language, language spoken by at least 9% of the population in both countries, country pair ever in colonial relationship, common colonizer post-1945, pair currently in colonial relationship, pair in colonial relationship post-1945, countries were or are the same country. All data on these variables come from [
11]. Data on migration is measured as the stock of foreign-born people by destination of origin for all countries and years [
12]. Furthermore, the data set contains two different food tastes similarity measures on the basis of the national dishes that are also included within this data set: (i) the Manhattan distance (named
food_sim_manhattan in the data set) and the (ii) latent semantic analysis (LSA) (named
food_sim_lsa in the data set).
Table 2 shows the description of all the variables included in the data set
gravity_ food_tastes_similarity.dta.
3. Methods
Our raw data contains data on bilateral agro-food trade flows and migration stocks for 171 countries for the years 1998, 2000, 2005, 2010 and 2015, as migration data is only available in 5-year intervals and for those 171 countries. Data on bilateral food trade flows for the years 1998, 2000, 2005, 2010 and 2015 is from a data set that is prepared by the Centre d’études prospectives et d’informations internationales (CEPI). It is downloaded from the UN Comtrade Database [
11]. Data on migration was downloaded from the [
12].
Data on national dishes was compiled by the authors. To compose the data set on the national dishes the authors collected data on a daily base from 4 June 2018 to 23 August 2018. For each country the national dishes listed on
https://simple.wikipedia.org/wiki/List_of_national_dishes (accessed on 25 September 2019) were used. In case of more than one national dish on the list, we compiled the ingredient list of each dish listed and included them to the data set but only made use of the most popular dish in the similarity measures.
With the information on each national dish of the countries an ingredient list of each dish was compiled. Doing so, the ingredients for all recipes were found on
foodpassport.com. If the recipe was not available on this website, the authors used
nationalfoods.org to retrieve the ingredient list. In the end, the authors compiled a data set with 350 national dishes from 171 countries that includes overall 218 different ingredients. If an ingredient is used to prepare a national dish, it is marked with a the number 1 in the respective line. All ingredients were included to the data, e.g., common ingredients like salt were also included as it is not used in all national dishes. Decisions had to be taken in relation to ingredients that are mixtures of ingredients. Hereby, we included every single ingredient that was used to prepare this mixture. Conversely, dishes that could not be prepared “in the moment” were taken as one single ingredient (e.g., dry-cured sausages or soy sauce). No differences were made concerning some ingredients like, e.g., pepper: We did not distinguish several peppers, like, e.g., black, rose or green pepper, but used the term pepper.
Both, the Manhattan distance and the latent semantic analysis (LSA) were then compiled by the authors and added to the data set as well. Both measures are similarly distributed. They are both bimodal distributions with a mass point at zero.
Figure 1 is a chord diagram. This chord diagram shows the results for the country pairs with the most and least food taste similarity based on LSA. Therefore, not all the 171 countries are displayed (but only 80 countries). Thick links show a high degree of similarity while thin lines indicate a low degree. For example, Russia’s food tastes are quite similar to those of Poland and Kyrgyzstan, whereas South Korea’s and Zimbabwe’s food tastes are very dissimilar.
To analyze food similarities across countries, the authors prefer the LSA food tastes similarity measure because it takes into account whether ingredients used in two national recipes are relatively common ingredients (e.g., salt, pepper) or uncommon (e.g., coriander which is only used in few recipes). Therefore,
Figure 1 uses the LSA index to illustrate (dis-)similarities between national dishes of countries across the world. Ref. [
7] provide in-depth details about both, the Manhattan distance and the LSA approach.
The final data on national dishes was extracted onto an MS Excel spreadsheet using simple software routines to assist with data validation like, e.g., spell checking for typographical errors and duplicate entries. As the data were compiled by the authors, this could be a source of potential errors because it is not generated by an algorithm. Eventually, it must be noted that the data set only contains a small size of dishes each country, in some cases only one national dish which is ignoring the variety of foods consumed in countries. But it is not our approach to identifying preferences perfectly, but to tease out a measure that is likely to best capture consumer taste. Data on migration and trade flows comes from [
11,
12] and is therefore an official data set.
4. Usage Notes
By merging a data set that links a measure of preferences to international trade flows, a variety of topics can be analyzed. The data set can be used to investigate, e.g., the effect of food tastes on international trade flows based on a dyadic measure of food tastes similarities between countries. Researchers in empirical trade economics, agriculture or health economics can benefit from these data. The data set assesses food preferences of the developed and developing world and can therefore contribute to the empirical literature on the effects of tastes or preferences on international trade. As the measure of tastes is time-invariant, it can be used to analyze the link between taste and markups for different sectors to get a better understanding on how export prices across markets differ due to taste conditional on quality. Besides, the data set can be used to show how much variation in export revenues can be explained by consumer taste. Due to the multiplicity, the data is highly acclaimed see, e.g., [
7,
8,
9,
10]. Besides, the data could be used to research the relationship between food preferences and health status (e.g., obesity) across countries.
The data can be used for research on the relation between food preferences and international trade or food preferences and health status (e.g., obesity) across countries, as the data set on food preferences not only includes national dishes but for most countries the most common dishes consumed.