Highly Efficient Fruit Mass and Size Estimation Using Only Top View Images

: This paper presents a new methodology for the estimation of the mass and size of a common Vietnamese fruit, the cavendish-type banana. We only used top-view images. Most previous works focused on volume estimation using a plurality of cameras to infer the three-dimensional information. In this work, we only used a single camera mounted on top of the fruit. We have found that our proposal leads to a relatively small estimation error (approximately 6%) compared to the results obtained from the measurements using a water-displacement method and a static digital scale. The results indicate that our system shows a great potential to be used in a real industrial setting. Future work will aim to investigate other features such as ripeness and bruises to increase the effectiveness and practicality of the system.


Introduction
According to Statista [1], bananas were in second place among the global fruit production in 2017. They contain carbohydrate, protein and a variety of minerals and vitamins. Their physical properties have been researched in order to get the highest value of each banana that can be released to the market.
Tropical countries which have high value bananas are receiving escalating orders, especially Vietnam. The Laba banana in Vietnam is part of the Musaceae family which is a subgroup of the Cavendish banana. Every year, Vietnam exports a billion tons of bananas to Japan, China, Malaysia and South Korea. Recently, this country has received a high reputation and gets many orders from EU countries, Russia and Qatar.
There were some researches on the banana with its physical properties, such as Ahmad et al. (2001). Ahmad et al. showed how the temperature has an impact on the ripening of banana fruit [2], whereas Mendoza et al. classified the maturity phases of bananas by the image processing method [3], or by applying gas sensors to detect whether the banana is ripe or not [4]. Similar experiments have been conducted with other fruits such as the strawberry [5], kiwi fruit [6], coconut [7], melon [8] and the mango [9,10]. Moreover, sorting and measuring the weight and volume of food is still one of the problems that the human being is facing. One of the most accurate methods is using a digital caliper and the water displacement method. However, these methods are too slow and unproductive when applied to the industry. M. Omid et al. developed a system that placed two Charge-coupled device (CCD) cameras on each side of citrus fruit [11]. Subsequently, they used the segmentation method and divided the citrus fruit into frustums. Finally, they estimated the volume by adding up all the approximate frustums. Li and Zhu sorted apples by measuring their diameters with a feature fusion of size, shape and color [12]. These studies focused on objects that are round or nearing the shape of an ellipse, while the banana is more complicated due to its shape. However, M. Soltani et al. also showed the way to estimate the volume Vellip and the surface area S of the banana [13]. They sliced the banana into small parts and summed all the calculated S and Vellip of each part to get the estimation.
Machine vision has played an important role in the food industry and agriculture in recent years. It is embedded into the production line in order to sort the product, check the quality standard, etc. However, the existing methods are still complicated and costly. In this paper, we propose a new method similar to [11], but this system requires only one camera that captures the top projected image of the fruit. With only simple calculations, we can estimate the volume and weight of banana fruit with high accuracy.

Experiment Setup
Our setup consisted of one camera mounted on a frame located above of a banana. An A4-sized paper was placed underneath the banana to be used as a guide for calibrating our system. In addition, we placed four light sources around the banana to get rid of shadows as shown in Figure 1. Subsequently, we cropped the images to the extent of the A4-paper to ease the background segmentation. Then, from the segmented images, we estimated several size-related parameters for each banana, such as length, width, perimeter and top view areas. We measured the widths, heights and weights of all the banana samples to use as ground truths for further comparison and estimation. The measurement tools comprised of a digital caliper and a digital scale. The water displacement method was also applied to measure the bananas' volumes.
Since we only captured top-view images, some important parameters were missing, including the side-view information of the bananas. We decided to formulate some experimental relationships for the depth parts. We collected many thin slices taken in the direction that is perpendicular to the top view plane. Figure 2 shows that the cross-sections were not circular. With over 200 test samples, we concluded that the width of the banana is always larger than its height, the average ratio h/w was approximately 0.956. After converting the top view images into grayscale, we used a traditional Otsu thresholding scheme for the GREEN channel of the images. We divided the bananas into two parts. The first part consisted of the top and the end of the banana and the second part was the middle of the banana as shown in Figure 3. We decided to use the length of part 1 is one fifth of the total length. To estimate the volume of part 1, we assumed it was an elliptical cone and applied the necessary volume calculation formula from [14] (see Figure 4).

Volume Estimation Result
Since the width and the height of the cross-section were not the same as previously shown in Section 2, we had to apply a correction ratio for the volume calculated for the elliptical cone and the frustrums by the h/w ratio which was 0.956. Finally, the total volume of the banana is the sum of the volumes of part 1 and part 2.
We tested with a data set of 56 bananas. For part 2, we wanted to investigate the number of thin slices necessary to give satisfactory results. We tested with four cases of 4, 8, 12 and 16 slices. The comparison is shown in Table 1. Table 1. Percentage errors of the total volume of the banana when part 2 was differently sliced.

Number of Slice 4 Slices 8 Slices 12 Slices 16 Slices
Average errors of all 56 bananas (%) 6.05 5.71 5.74 5.73 As shown in the result, eight slices was the best option for our case. Figure 7 shows the average error for the volume estimation based on our approach compared to the actual volume obtained from the water displacement method. The average error was only about 7% for 56 test samples.

Mass Estimation Result
With the real value of the mass and the volume of the bananas, we got the average of all bananas' density as 0.9183 (g/mL). From this, we can estimate a banana's weight by using the product of the estimated volume and the average bananas' density. The weight estimation error is shown in Figure  8.

Computational Expense
The hardware that we used for this experiment was the Dell 5558 with the Intel Core i7-5500U (2.4Ghz). The Python and OpenCV Library were extensively used in this project. We also recorded the processing time for all of the samples (56 pictures), each with 100 repetitions. The outcome is shown in the following table (Table 2).

Conclusions
In machine vision, the 3D image of fruits will solve all the problems of the 1D image (measuring height) and the 2D image (surface area). However, with 3D image, we need a 3D camera, or many cameras projected around the object. In addition, a powerful computing hardware is also needed in order to calculate all the features of each object, such as Savan Dhameliya et al. who set up a system that contained five cameras to estimate the surface area and the projected area of the mango in digital images [15]. This paper showed that with only a single camera, we can estimate the size, volume and mass of a Laba banana. Furthermore, the computational steps contained only some geometric formulas. These make this system's requirements low cost, fast and with a high accuracy from 94% to 95%. In the future, we want to make a portable version of the high dynamic range imaging system as in [16,17] so that we can capture and evaluate the banana in the field.