# Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Image Thresholding

#### 2.1. Segmentation Quality

## 3. Reinforcement Learning-Based Thresholding

#### 3.1. Proposed Algorithm

#### 3.2. Image thresholding as an MDP

#### 3.2.1. States

#### 3.2.2. Actions

#### 3.2.3. T Function

#### 3.2.4. R Function

#### 3.3. Exploration-Exploitation Strategy

#### 3.3.1. Epsilon-Greedy Algorithm

#### 3.3.2. Decaying Epsilon-Greedy Algorithm

#### 3.3.3. Q-value Difference Measurement

#### 3.4. Convergence

## 4. Training and Testing

## 5. Experimental Results

## 6. Conclusions and Future Work

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Hayashi, S.; Ota, T.; Kubota, K.; Ganno, K.; Kondo, N. Robotic harvesting technology for fruit vegetables in protected horticultural production. In Proceedings of the Information and Technology for Sustainable Fruit and Vegetable Production, Montpellier, France, 12–16 September 2005; pp. 227–236. [Google Scholar]
- Hellström, T.; Ringdahl, O. A software framework for agricultural and forestry robots. Ind. Robot Int. J.
**2013**, 40, 20–26. [Google Scholar] [CrossRef] - Ringdahl, O.; Kurtser, P.; Barth, R.; Edan, Y. Operational flow of an autonomous sweetpepper harvesting robot. In Proceedings of the 5th Israeli Conference on Robotics, Hertzilya, Israel, 13–14 April 2016. [Google Scholar]
- Ringdahl, O.; Kurtser, P.; Edan, Y. Strategies for selecting best approach direction for a sweet-pepper harvesting robot. In Towards Autonomous Robotic Systems, Proceedings of the 18th Annual Conference, Guildford, UK, 19–21 July 2017; Springer: Berlin, Germany, 2017. [Google Scholar]
- Gongal, A.; Amatya, S.; Karkee, M.; Zhang, Q.; Lewis, K. Sensors and systems for fruit detection and localization: A review. Comput. Electron. Agric.
**2015**, 116, 8–19. [Google Scholar] [CrossRef] - Kapach, K.; Barnea, E.; Mairon, R.; Edan, Y.; Ben-Shahar, O. Computer vision for fruit harvesting robots—State of the art and challenges ahead. Int. J. Comput. Vis. Robot.
**2012**, 3, 4–34. [Google Scholar] [CrossRef] - Grift, T.; Zhang, Q.; Kondo, N.; Ting, K. A review of automation and robotics for the bioindustry. J. Biomech. Eng.
**2008**, 1, 37–54. [Google Scholar] - Bac, C.W.; Henten, E.J.; Hemming, J.; Edan, Y. Harvesting Robots for High-value Crops: State-of-the-art Review and Challenges Ahead. J. Field Robot.
**2014**, 31, 888–911. [Google Scholar] [CrossRef] - Bhanu, B.; Lee, S. Genetic Learning for Adaptive Image Segmentation; Springer Science and Business Media: Berlin, Germany, 2012; Volume 287. [Google Scholar]
- Haralick, R.M.; Shapiro, L.G. Image Segmentation Techniques. Comput. Vis. Graph. Image Process.
**1985**, 29, 100–132. [Google Scholar] [CrossRef] - Martin, V.; Maillot, N.; Thonnat, M. A Learning Approach for Adaptive Image Segmentation. In Proceedings of the International Conference on Computer Vision Systems, New York, NY, USA, 4–7 January 2006; p. 40. [Google Scholar]
- Nalwa, V.S. A Guided Tour of Computer Vision; Addison-Wesley: Boston, MA, USA, 1994. [Google Scholar]
- Vitzrabin, E.; Edan, Y. Adaptive thresholding with fusion using a RGBD sensor for red sweet-pepper detection. Biosyst. Eng.
**2016**, 146, 45–56. [Google Scholar] [CrossRef] - Vitzrabin, E.; Edan, Y. Changing task objectives for improved sweet pepper detection for robotic harvesting. IEEE Robot. Autom. Lett.
**2016**, 1, 578–584. [Google Scholar] [CrossRef] - Hannan, M.; Burks, T.; Bulanon, D. A real-time machine vision algorithm for robotic citrus harvesting. In Proceedings of the 2007 ASAE Annual Meeting, Minneapolis, Minnesota, 17–20 June 2007; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2007. [Google Scholar]
- Shokri, M.; Tizhoosh, H.R. A reinforcement agent for threshold fusion. Appl. Soft Comput.
**2008**, 8, 174–181. [Google Scholar] [CrossRef] - Bhanu, B.; Peng, J. Adaptive integrated image segmentation and object recognition. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)
**2000**, 30, 427–441. [Google Scholar] [CrossRef] - Peng, J.; Bhanu, B. Closed-loop object recognition using reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell.
**1998**, 20, 139–154. [Google Scholar] [CrossRef] - Sahba, F.; Tizhoosh, H.R.; Salama, M.M. Application of opposition-based reinforcement learning in image segmentation. In Proceedings of the IEEE Symposium on Computational Intelligence in Image and Signal Processing (CIISP 2007), Honolulu, HI, USA, 1–5 April 2007; pp. 246–251. [Google Scholar]
- Yin, P.Y. Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization. Signal Process.
**2002**, 82, 993–1006. [Google Scholar] [CrossRef] - Bianchi, R.A.; Ramisa, A.; De Mántaras, R.L. Automatic selection of object recognition methods using reinforcement learning. In Advances in Machine Learning I; Springer: Berlin, Germany, 2010; pp. 421–439. [Google Scholar]
- Kokai, I.; Lorincz, A. Fast adapting value estimation-based hybrid architecture for searching the world-wide web. Appl. Soft Comput.
**2002**, 2, 11–23. [Google Scholar] [CrossRef] - Li, H.; Venugopal, S. Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC 11), Karlsruhe, Germany, 14–18 June 2011; ACM: New York, NY, USA, 2011; pp. 205–208. [Google Scholar]
- Jamshidi, P.; Sharifloo, A.; Pahl, C.; Arabnejad, H.; Metzger, A.; Estrada, G. Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Architectures. In Proceedings of the 2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA), Venice, Italy, 5–8 April 2016; pp. 70–79. [Google Scholar]
- Arabnejad, H.; Jamshidi, P.; Estrada, G.; El Ioini, N.; Pahl, C. An Auto-Scaling Cloud Controller Using Fuzzy Q-Learning - Implementation in OpenStack. In Service-Oriented and Cloud Computing, Proceedings of the 5th IFIP WG 2.14 European Conference, ESOCC 2016, Vienna, Austria, 5–7 September 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 152–167. [Google Scholar]
- Wu, J.; Xu, X.; Zhang, P.; Liu, C. A novel multi-agent reinforcement learning approach for job scheduling in Grid computing. Future Gener. Comput. Syst.
**2011**, 27, 430–439. [Google Scholar] [CrossRef] - Vengerov, D. A reinforcement learning approach to dynamic resource allocation. Eng. Appl. Artif. Intell.
**2007**, 20, 383–390. [Google Scholar] [CrossRef] - Gonzalez, R.C.; Woods, R.E. Image processing. In Digital Image Processing, 2nd ed.; Pearson: London, UK, 2007. [Google Scholar]
- Bovik, A.C. Handbook of Image and Video Processing; Academic Press: Cambridge, MA, USA, 2010. [Google Scholar]
- Sezgin, M.; Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging
**2004**, 13, 146–168. [Google Scholar] - Yan, H. Unified formulation of a class of image thresholding techniques. Pattern Recognit.
**1996**, 29, 2025–2032. [Google Scholar] [CrossRef] - Huang, L.K.; Wang, M.J.J. Image thresholding by minimizing the measures of fuzziness. Pattern Recognit.
**1995**, 28, 41–51. [Google Scholar] [CrossRef] - Fang, Y.; Yamada, K.; Ninomiya, Y.; Horn, B.K.; Masaki, I. A shape-independent method for pedestrian detection with far-infrared images. IEEE Trans. Veh. Technol.
**2004**, 53, 1679–1697. [Google Scholar] [CrossRef] - Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
- Mitchell, T.M. Machine Learning; WCB: Edmonton, AB, Canada, 1997. [Google Scholar]
- Thrun, S.B. Efficient Exploration in Reinforcement Learning; Carnegie Mellon University: Pittsburgh, PA, USA, 1992. [Google Scholar]
- Brafman, R.I.; Tennenholtz, M. R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res.
**2002**, 3, 213–231. [Google Scholar] - Ishii, S.; Yoshida, W.; Yoshimoto, J. Control of exploitation—Exploration meta-parameter in reinforcement learning. Neural Netw.
**2002**, 15, 665–687. [Google Scholar] [CrossRef] - Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 1989. [Google Scholar]
- Tokic, M. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In Proceedings of the 33rd annual German Conference on Advances in Artificial Intelligence, Karlsruhe, Germany, 21–24 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 203–210. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn.
**1992**, 8, 279–292. [Google Scholar] [CrossRef] - Kaelbling, L.P. Learning in Embedded Systems; MIT Press: Cambridge, MA, USA, 1993. [Google Scholar]
- Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learn.
**1988**, 2, 285–318. [Google Scholar] [CrossRef] - Jones, D.R.; Perttunen, C.D.; Stuckman, B.E. Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl.
**1993**, 79, 157–181. [Google Scholar] [CrossRef] - Björkman, M.; Holmström, K. Global Optimization Using the DIRECT Algorithm in Matlab. In Advanced Modeling and Optimization; Editura: Bucharest, Romania, 1999; Volume 1, pp. 17–37. [Google Scholar]

**Figure 1.**The developed image segmentation technique is designed to be part of a robotic system that guides the robotic manipulator towards a detected fruit using visual servoing.

**Figure 3.**The proposed approach for adaptive image thresholding based on reinforcement learning (RL).

**Figure 4.**Illustration of the proposed algorithm for image segmentation using Q-learning. For each image the state is determined and an action is selected to segment the image. Then the segmentation quality function is used to compare the segmented image with the ground truth to compute the reward and update the Q-table.

**Figure 6.**Left (

**a**,

**d**,

**g**): three different images from the database which belongs to the same state (centroid); Middle (

**b**,

**e**,

**h**): segmentation result of applying threshold (${H}_{min}$ = 0.1, ${H}_{max}$ = 0.2, ${S}_{min}$ = 0.5, ${S}_{max}$ = 1) on the input images, resulting in segmentation quality $q=$ 0.09, 0.12 and 0.15 for (

**b**,

**e**,

**h**), respectively; Right (

**c**,

**f**,

**i**,): manually labeled images of yellow peppers used as ground truth.

**Figure 7.**Q-values during the learning process for a state-action pair using decaying $\u03f5$-greedy strategy. Since at the beginning of the learning the number of exploration is high, changes of Q-values are large. As the process continues the exploitation rate increases, which results in more stable trend in Q-values.

**Figure 8.**Q-value differences $\Delta Q$ of a state-action pair as a function iteration numbers during the learning process until it reached the convergence condition. $\Delta Q$ generally decrease as the process gets closer to the convergence.

**Figure 9.**Normalized performance ${P}_{n}$ (Equation (9)) of different exploration-exploitation strategies for each fold of cross validated test data. For comparison, the normalized performance of the benchmark (100%) is also presented. The same color of circles in different methods shows that they belong to the same fold of the test images.

**Figure 10.**The result of applying the optimal threshold selected by decaying $\u03f5$-greedy (middle column (

**b**,

**e**,

**h**)) and the benchmark (right column (

**c**,

**f**,

**i**)) algorithms to three different input images (

**a**,

**d**,

**g**)) which belongs to the same state (centroid). Normalized performance ${P}_{n}=$ 87%, 92% and 96% for (

**b**,

**e**,

**h**) respectively.

Exploration-Exploitation Strategy | q | ${\mathit{P}}_{\mathit{n}}$ | #Iterations |
---|---|---|---|

Decaying Epsilon-Greedy | 63.4% | 91.5% | 9567 |

Q-value Difference Measurement | 55.7% | 80.3% | 8947 |

Epsilon-Greedy ($\u03f5=0.2$) | 52.2% | 75.3% | 4122 |

Epsilon-Greedy ($\u03f5=0.5$) | 56.7% | 81.8% | 9081 |

Epsilon-Greedy ($\u03f5=0.7$) | 60.3% | 87.0% | 17,751 |

Randomly Selected Actions | 39.6% | 57.1% | 9567 |

Optimization Method | 53.6% | 77.4% | 9567 |

Benchmark (Exhaustive search) | 69.3% | 100.0% | 35,700 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ostovar, A.; Ringdahl, O.; Hellström, T. Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot. *Robotics* **2018**, *7*, 11.
https://doi.org/10.3390/robotics7010011

**AMA Style**

Ostovar A, Ringdahl O, Hellström T. Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot. *Robotics*. 2018; 7(1):11.
https://doi.org/10.3390/robotics7010011

**Chicago/Turabian Style**

Ostovar, Ahmad, Ola Ringdahl, and Thomas Hellström. 2018. "Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot" *Robotics* 7, no. 1: 11.
https://doi.org/10.3390/robotics7010011