A Data Descriptor for Black Tea Fermentation Dataset

: Tea is currently the most popular beverage after water. Tea contributes to the livelihood of more than 10 million people globally. There are several categories of tea, but black tea is the most popular, accounting for about 78% of total tea consumption. Processing of black tea involves the following steps: plucking, withering, crushing, tearing and curling, fermentation, drying, sorting, and packaging. Fermentation is the most important step in determining the ﬁnal quality of the processed tea. Fermentation is a time-bound process and it must take place under certain temperature and humidity conditions. During fermentation, tea color changes from green to coppery brown to signify the attainment of optimum fermentation levels. These parameters are currently manually monitored. At present, there is only one existing dataset on tea fermentation images. This study makes a tea fermentation dataset available, composed of tea fermentation conditions and tea fermentation images. Dataset: 10.5281/zenodo.4469326. Dataset License: Creative Commons Attribution 4.0 International.


Background and Rationale
Tea is currently among the most popularly consumed beverage across the world [1] and is responsible for the economic growth of many countries, including India, Sri-Lanka, Kenya, China among other countries [2]. These top tea-producing countries produce several varieties of tea, which include: yellow tea, illex tea, oolong tea, black tea, white tea, among others [3]. Among these categories of tea, black tea is the most consumed, accounting for approximately 78% of the total daily consumption of tea [4]. Kenya is the leading exporter of black tea worldwide, with her major tea-producing counties being Kericho, Bomet, Nandi, and Nyeri [5]. The crop is a source of livelihood for more than 10 million of the total countries' estimated population of 47 million people [6]. Although the crop is still the leading exchange earner for the county, the sector is ailing due to everreducing tea prices [7]. This is attributed to increased competition from other countries, poor management, and the low quality of tea produced, among others [8].
The steps of processing black tea are plucking, withering, crushing, tearing and curling, fermentation, drying, sorting, and packaging [9]. Among these processes, fermentation is the key determinant of the final quality of the processed tea [10]. The process is timebound and must take place within a given temperature and humidity range [11]. During fermentation processes, tea changes color from green to coppery brown and finally to dark red [12]. The optimally fermented tea is coppery brown in color, and has a fruity Data 2021, 6, 34 2 of 8 smell and a sweet taste, while the unfermented tea is green in color, and has a grassy smell and a strong taste [13]. The overfermented tea is dark red and is characterized by a bitter taste [14]. Currently, the optimum fermentation of tea is monitored manually by tea tasters, adopting the following techniques: monitoring color change, tasting tea as fermentation progresses, and smelling the odor of tea during fermentation [15]. These manual methods are subjective and lead to a compromise in the quality of the made tea. Image processing and machine learning techniques have shown high levels of ability in various fields, including medicine [16], education, E-Commerce [17] tourism and banking, among others [18]. However, for image processing and machine learning to work, there is a need for data for training and evaluation of the models. Worryingly, there is only one reported open-source dataset on tea fermentation images [19], which is a limiting phenomenon as researchers have only one dataset to train and evaluate their machine learning models. Therefore, this study aims to resolve this challenge by adopting the Internet of Things (IoT) to capture and release a tea fermentation dataset composed of temperature, humidity, and black tea fermentation images.
The rest of the paper is arranged as follows. Section 3 presents the description of the data in the dataset. Section 2 presents materials and methods, and Section 5 presents the conclusion of the paper.

Materials and Methods
This section describes the resources and the approach followed in the collection of the dataset. Section 2.1 discusses on the resources while the collection of the dataset is discussed in Section 2.2.

Resources
The following resources were instrumental in acquiring the data: Raspberry pi, Pi-Camera, Server, and programming languages. Raspberry Pi model B+ was adopted due to its increased processing power and its dual-band Wi-Fi Feature [20]. The Raspbian operating system for raspberry pi [21] was used. Raspian was chosen as it is available at no cost and is easy to install and use. A raspberry pi camera of 8 megapixels was used. The board was chosen since it is very small, weighing around 3 g, making it perfect for deployment with the raspberry pi. Amazon Web Services (AWS) [22] was chosen as a cloud provider for storage of the data. The AWS provides services that make it easy to store images and also offers an initial free service for 1 year. Python programming language [23] was used in writing programs to capture the images using the Pi camera. The block diagram of the system for capturing the dataset is shown in Figure 1. smell and a sweet taste, while the unfermented tea is green in color, and has a grassy smell and a strong taste [13]. The overfermented tea is dark red and is characterized by a bitter taste [14]. Currently, the optimum fermentation of tea is monitored manually by tea tasters, adopting the following techniques: monitoring color change, tasting tea as fermentation progresses, and smelling the odor of tea during fermentation [15]. These manual methods are subjective and lead to a compromise in the quality of the made tea. Image processing and machine learning techniques have shown high levels of ability in various fields, including medicine [16], education, E-Commerce [17] tourism and banking, among others [18]. However, for image processing and machine learning to work, there is a need for data for training and evaluation of the models. Worryingly, there is only one reported open-source dataset on tea fermentation images [19], which is a limiting phenomenon as researchers have only one dataset to train and evaluate their machine learning models. Therefore, this study aims to resolve this challenge by adopting the Internet of Things (IoT) to capture and release a tea fermentation dataset composed of temperature, humidity, and black tea fermentation images. The rest of the paper is arranged as follows. Section 3 presents the description of the data in the dataset. Section 2 presents materials and methods, and Section 5 presents the conclusion of the paper.

Materials and Methods
This section describes the resources and the approach followed in the collection of the dataset. Section 2.1 discusses on the resources while the collection of the dataset is discussed in Section 2.2.

Resources
The following resources were instrumental in acquiring the data: Raspberry pi, Pi-Camera, Server, and programming languages. Raspberry Pi model B+ was adopted due to its increased processing power and its dual-band Wi-Fi Feature [20]. The Raspbian operating system for raspberry pi [21] was used. Raspian was chosen as it is available at no cost and is easy to install and use. A raspberry pi camera of 8 megapixels was used. The board was chosen since it is very small, weighing around 3g, making it perfect for deployment with the raspberry pi. Amazon Web Services (AWS) [22] was chosen as a cloud provider for storage of the data. The AWS provides services that make it easy to store images and also offers an initial free service for 1 year. Python programming language [23] was used in writing programs to capture the images using the Pi camera. The block diagram of the system for capturing the dataset is shown in Figure 1.

Collection of the Dataset
The Internet-of-Things-based system for capturing the data was deployed in the Sisibo factory, Kenya for 4 days: 10-13 August 2020. The system was set up just above the tea

Collection of the Dataset
The Internet-of-Things-based system for capturing the data was deployed in the Sisibo factory, Kenya for 4 days: 10-13 August 2020. The system was set up just above the tea

Data Description
This study releases a black tea fermentation dataset which contains black tea fermentation images and physical parameters of tea during fermentation, which were collected in Sisibo tea factory, Kenya between 10 and 13 August 2020. The dataset can be found here: https://doi.org/10.5281/zenodo.4469326 (accessed on 2 March 2021). The file structure of the dataset is shown in Figure 3.

Data Description
This study releases a black tea fermentation dataset which contains black tea fermentation images and physical parameters of tea during fermentation, which were collected in Sisibo tea factory, Kenya between 10 and 13 August 2020. The dataset can be found here: https://doi.org/10.5281/zenodo.4469326 (accessed on 2 March 2021). The file structure of the dataset is shown in Figure 3.

Data Description
This study releases a black tea fermentation dataset which contains black tea fermentation images and physical parameters of tea during fermentation, which were collected in Sisibo tea factory, Kenya between 10 and 13 August 2020. The dataset can be found here: https://doi.org/10.5281/zenodo.4469326 (accessed on 2 March 2021). The file structure of the dataset is shown in Figure 3.     The images folder contains images that were taken as fermentation took place. These images correspond with the Reference showing the images column in the CSV files. In addition to these images, "Categorized black tea fermentation images" folder contains 6000 black tea fermentation images categorized into three classes of 2000 images each. The classification of these images was based on the decision of two tea fermentation experts. The experts relied on the color, smell, and taste of tea during fermentation to classify the images. The classes are unfermented, fermented, and overfermented.
The underfermented tea is in a folder labeled "unfermented tea" in the "Categorized black tea fermentation images" folder. The level of fermentation of this category of tea is below the optimum. These tea images are usually green. The images are labeled from "underfermented _00 " to "underfermented _1999". A sample of the images is presented in Figure 5a.
The fermented tea is optimally fermented and is usually coppery brown. The fermented tea is in a folder labeled "fermented tea" in the images folder. The images are labeled from "fermented _00" to "fermented _1999". A sample of the images is presented in Figure 5b.
The fermentation level of overfermented tea is beyond the optimum and is dark red. Overfermented tea is in a folder labeled "overfermeted tea". The images are labeled from "Overfermented _00" to "Overfermented _1999". A sample of the images is presented in Figure 5c.

Data Validation
The fermentation of tea must take place within a given range of temperature and humidity. These ranges vary in different countries but, in Kenya, the acceptable range is between 20 and 30 degrees celcius. Figure 6 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 10 August 2020. The fermentation process started at 12:31:13 h. The temperature recorded ranged from 19°C to 28°C , which was well within the optimum ranges. The values of the temperature increased steadily with time due to the natural climatic conditions of the area. On the other hand, humidity was between 75% and 92% for the low and the high, respectively. The values of humidity reduced steadily with time. The highest temperature value recorded was 28.8°C, while the The images folder contains images that were taken as fermentation took place. These images correspond with the Reference showing the images column in the CSV files. In addition to these images, "Categorized black tea fermentation images" folder contains 6000 black tea fermentation images categorized into three classes of 2000 images each. The classification of these images was based on the decision of two tea fermentation experts. The experts relied on the color, smell, and taste of tea during fermentation to classify the images. The classes are unfermented, fermented, and overfermented.
The underfermented tea is in a folder labeled "unfermented tea" in the "Categorized black tea fermentation images" folder. The level of fermentation of this category of tea is below the optimum. These tea images are usually green. The images are labeled from "underfermented _00 " to "underfermented _1999". A sample of the images is presented in Figure 5a.
The fermented tea is optimally fermented and is usually coppery brown. The fermented tea is in a folder labeled "fermented tea" in the images folder. The images are labeled from "fermented _00" to "fermented _1999". A sample of the images is presented in Figure 5b.
The fermentation level of overfermented tea is beyond the optimum and is dark red. Overfermented tea is in a folder labeled "overfermeted tea". The images are labeled from "Overfermented _00" to "Overfermented _1999". A sample of the images is presented in Figure 5c. Black tea fermentation conditions are contained in a Comma Delimited Values (CSV) file and contain the fermentation time, temperature, humidity, reference to image and the category of tea (Figure 4). The images folder contains images that were taken as fermentation took place. These images correspond with the Reference showing the images column in the CSV files. In addition to these images, "Categorized black tea fermentation images" folder contains 6000 black tea fermentation images categorized into three classes of 2000 images each. The classification of these images was based on the decision of two tea fermentation experts. The experts relied on the color, smell, and taste of tea during fermentation to classify the images. The classes are unfermented, fermented, and overfermented.
The underfermented tea is in a folder labeled "unfermented tea" in the "Categorized black tea fermentation images" folder. The level of fermentation of this category of tea is below the optimum. These tea images are usually green. The images are labeled from "underfermented _00 " to "underfermented _1999". A sample of the images is presented in Figure 5a.
The fermented tea is optimally fermented and is usually coppery brown. The fermented tea is in a folder labeled "fermented tea" in the images folder. The images are labeled from "fermented _00" to "fermented _1999". A sample of the images is presented in Figure 5b.
The fermentation level of overfermented tea is beyond the optimum and is dark red. Overfermented tea is in a folder labeled "overfermeted tea". The images are labeled from "Overfermented _00" to "Overfermented _1999". A sample of the images is presented in Figure 5c.

Data Validation
The fermentation of tea must take place within a given range of temperature and humidity. These ranges vary in different countries but, in Kenya, the acceptable range is between 20 and 30 degrees celcius. Figure 6 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 10 August 2020. The fermentation process started at 12:31:13 h. The temperature recorded ranged from 19°C to 28°C , which was well within the optimum ranges. The values of the temperature increased steadily with time due to the natural climatic conditions of the area. On the other hand, humidity was between 75% and 92% for the low and the high, respectively. The values of humidity reduced steadily with time. The highest temperature value recorded was 28.8°C, while the

Data Validation
The fermentation of tea must take place within a given range of temperature and humidity. These ranges vary in different countries but, in Kenya, the acceptable range is between 20 and 30 degrees celcius. Figure 6 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 10 August 2020. The fermentation process started at 12:31:13 h. The temperature recorded ranged from 19 • C to 28 • C , which was well within the optimum ranges. The values of the temperature increased steadily with time due to the natural climatic conditions of the area. On the other hand, humidity was between 75% and 92% for the low and the high, respectively. The values of humidity reduced steadily with time. The highest temperature value recorded was 28.8 • C, while the Data 2021, 6, 34 5 of 8 highest humidity value achieved was 82.6%. The fermentation curve was smooth and it took 68 min before the tea was fully fermented. Figure 7 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 11 August 2020. The fermentation process started at 08:48:03 h. The temperatures recorded ranged from 20.6 • C to 24.50 • C. These temperatures were within the optimum. The values of the temperature increased steadily with time due to the natural climatic conditions of the area. The highest humidity value recorded was at 90.8%. The values of humidity fluctuated throughout the fermentation period. The fermentation curve was smooth and it took 51 min before the tea was fully fermented. This is the shortest fermented duration reported in this data descriptor, as the fermentation temperatures were higher than the rest of the days.  Figure 7 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 11 August 2020. The fermentation process started at 08:48:03 h. The temperatures recorded ranged from 20.6°C to 24.50°C. These temperatures were within the optimum. The values of the temperature increased steadily with time due to the natural climatic conditions of the area. The highest humidity value recorded was at 90.8%. The values of humidity fluctuated throughout the fermentation period. The fermentation curve was smooth and it took 51 min before the tea was fully fermented. This is the shortest fermented duration reported in this data descriptor, as the fermentation temperatures were higher than the rest of the days.   Figure 8 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 12 August 2020. The fermentation process started at 02:00:13 h. The temperature recorded ranged from 15°C to 25.53°C. These temperatures fell below the optimum, unlike on 10 August 2020. This is due to the fermentation occurring in the early morning when the region is naturally colder than during midday hours. The values of the temperature increased steadily with time, due to the natural climatic conditions of the area. The highest humidity value recorded was at 95%. The fermentation curve was smooth and it took 64 min before the tea was fully fermented. highest humidity value achieved was 82.6%. The fermentation curve was smooth and it took 68 min before the tea was fully fermented.  Figure 7 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 11 August 2020. The fermentation process started at 08:48:03 h. The temperatures recorded ranged from 20.6°C to 24.50°C. These temperatures were within the optimum. The values of the temperature increased steadily with time due to the natural climatic conditions of the area. The highest humidity value recorded was at 90.8%. The values of humidity fluctuated throughout the fermentation period. The fermentation curve was smooth and it took 51 min before the tea was fully fermented. This is the shortest fermented duration reported in this data descriptor, as the fermentation temperatures were higher than the rest of the days.  Figure 8 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 12 August 2020. The fermentation process started at 02:00:13 h. The temperature recorded ranged from 15°C to 25.53°C. These temperatures fell below the optimum, unlike on 10 August 2020. This is due to the fermentation occurring in the early morning when the region is naturally colder than during midday hours. The values of the temperature increased steadily with time, due to the natural climatic conditions of the area. The highest humidity value recorded was at 95%. The fermentation curve was smooth and it took 64 min before the tea was fully fermented.  Figure 8 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 12 August 2020. The fermentation process started at 02:00:13 h. The temperature recorded ranged from 15 • C to 25.53 • C. These temperatures fell below the optimum, unlike on 10 August 2020. This is due to the fermentation occurring in the early morning when the region is naturally colder than during midday hours. The values of the temperature increased steadily with time, due to the natural climatic conditions of the area. The highest humidity value recorded was at 95%. The fermentation curve was smooth and it took 64 min before the tea was fully fermented. Figure 9 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 13 August 2020. The fermentation process started on 07:00:00 h. The highest temperature recorded was at 26.25 • C. Once again, the temperature ranges fell below the optimum, unlike on 10 August 2020. This is because fermentation was carried out in the early morning when the region is naturally colder than during midday hours. The highest humidity value recorded was at 95%. The fermentation curve was smooth and it took 58 min before the tea was fully fermented.  Figure 9 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 13 August 2020. The fermentation process started on 07:00:00 h. The highest temperature recorded was at 26.25°C. Once again, the temperature ranges fell below the optimum, unlike on 10 August 2020. This is because fermentation was carried out in the early morning when the region is naturally colder than during midday hours. The highest humidity value recorded was at 95%. The fermentation curve was smooth and it took 58 min before the tea was fully fermented. The average temperature and humidity values for the days are presented in Figure 10. The highest average temperature was recorded on 10 August 2020, with 26.14°C. On 11 August 2020, an average temperature of 22.31°C was recorded. On the other days, the average temperatures were 19.39°C and 18.83°C for 12 August 2020 and 3 August 2020, respectively. Thus, the fermentation took place outside the range of 20-30°C for the two days.  Figure 9 shows temperature and humidity values during a fermentation cycle in Sisibo tea factory on 13 August 2020. The fermentation process started on 07:00:00 h. The highest temperature recorded was at 26.25°C. Once again, the temperature ranges fell below the optimum, unlike on 10 August 2020. This is because fermentation was carried out in the early morning when the region is naturally colder than during midday hours. The highest humidity value recorded was at 95%. The fermentation curve was smooth and it took 58 min before the tea was fully fermented. The average temperature and humidity values for the days are presented in Figure 10. The highest average temperature was recorded on 10 August 2020, with 26.14°C. On 11 August 2020, an average temperature of 22.31°C was recorded. On the other days, the average temperatures were 19.39°C and 18.83°C for 12 August 2020 and 3 August 2020, respectively. Thus, the fermentation took place outside the range of 20-30°C for the two days. The average temperature and humidity values for the days are presented in Figure 10. The highest average temperature was recorded on 10 August 2020, with 26.14 • C. On 11 August 2020, an average temperature of 22.31 • C was recorded. On the other days, the average temperatures were 19.39 • C and 18.83 • C for 12 August 2020 and 3 August 2020, respectively. Thus, the fermentation took place outside the range of 20-30 • C for the two days.

Conclusions
In this paper, a tea fermentation dataset has been released. The dataset was collected in the Sisibo tea factory, Kenya using an IoT-based model. The model predicted fermentation cycle of tea and correlated well with the decisions of the tea fermentation experts. Fermentation experts gave ground truths for the dataset. The dataset can be used by researchers in training machine learning models for the detection of the optimum fermentation of tea. This is a significant achievement in the field of applying machine learning to the detection

Conclusions
In this paper, a tea fermentation dataset has been released. The dataset was collected in the Sisibo tea factory, Kenya using an IoT-based model. The model predicted fermentation cycle of tea and correlated well with the decisions of the tea fermentation experts. Fermentation experts gave ground truths for the dataset. The dataset can be used by researchers in training machine learning models for the detection of the optimum fermentation of tea. This is a significant achievement in the field of applying machine learning to the detection of optimum fermentation of tea, as there is currently only one existing dataset for training these models.