Development of Smart and Lean Pick-and-Place System Using EfﬁcientDet-Lite for Custom Dataset

: Object detection for a pick-and-place system has been widely acknowledged as a signiﬁcant research area in the ﬁeld of computer vision. The integration of AI and machine vision with pick-and-place operations should be made affordable for Small and Medium Enterprises (SMEs) so they can leverage this technology. Therefore, the aim of this study is to develop a smart and lean pick-and-place solution for custom workpieces, which requires minimal computational resources. In this study, we evaluate the effectiveness of illumination and batch size to improve the Average Precision (AP) and detection score of an EfﬁcientDet-Lite model. The addition of 8% optimized bright Alpha3 images results in an increase of 7.5% in AP and a 6.3% increase in F1-score as compared to the control dataset. Using a training batch size of 4, the AP is signiﬁcantly improved to 66.8% as compared to a batch size of 16 at 57.4%. The detection scores are improved to 80% with a low variance of 1.65 using a uniform 135-angle lamp and 0 illumination level. The pick-and-place solution is validated using Single-Shot Detector (SSD) MobileNet V2 Feature Pyramid Network (FPN) Lite. Our experimental results clearly show that the proposed method has an increase of 5.19% in AP compared to SSD MobileNet V2 FPNLite.


Introduction
One of the current trends in advanced manufacturing is to employ Artificial Intelligence (AI) methods to improve the pick-and-place process.The integration of AI and machine vision with pick-and-place operations can significantly improve the manufacturing process.It should be made affordable for Small and Medium Enterprises (SMEs) so that they can leverage the benefits come with this technology without being concerned with allocating a significant financial budget.The fast and smooth integration of machine vision technology with the current pick-and-place operations of SMEs is another crucial aspect that should be taken into consideration.
In this context, any machine vision solution should be developed in a way that the commissioning and installing can be carried out simply and quickly by the field operators of SMEs without special skills.Therefore, one of the current trends in advanced manufacturing is to employ object detection using Deep Learning methods to improve the pick-and-place process.Furthermore, we study the usage of this smart and lean system in low-light environments, such as waferfab manufacturing during the night shift.Die-sorting machines, for example, can be used in low-light environments because they are totally automated and require no user intervention.As such, sufficient ambient lighting is not needed [1], and hence, our aim is to develop a smart and lean pick-and-place solution, which requires minimal computational resources, for custom workpieces.
AI model efficiency has become increasingly important in computer vision, and object detection using machine learning is becoming crucial for pick-and-place operation and warehousing logistics [2,3].Other than allowing the smart detection of objects without camera recalibration, AI enhances the application's robustness to light changes, shadow, background noise and low-illumination [4].
Continuing from our previous work [5], we aim to improve the pick-and-place operation using an Artificial Intelligence model deployed on an embedded controller.Our target is to develop a smart and lean pick-and-place system that meets the requirements of low power consumption, small memory usage, and fast run time.
Model efficiency has become increasingly important in computer vision to be used on mobile devices and embedded controllers such as Raspberry Pi due to the model's compact size and quick inference.Convolutional Neural Network (CNN) models are crucial for feature extraction and classification, but conventional high-performance algorithms are expensive due to their high-power consumption and memory usage.
TensorFlow has developed several deep learning object identification models that were designed for embedded and mobile devices, hence the name TensorFlow Lite (TFLite).The model's compact size and quick inference for TFLite versions make them appropriate for a variety of applications, including wearable technology and the Internet of Things.A pre-trained TensorFlow model is used to act as a learnt generic model because it has previously been trained on a sizable general dataset.This helps to reduce the quantity of data required to train a model from scratch.
Our project uses the state-of-the-art model EfficientDet-Lite [6], which was developed by Google in 2020.Compared to other state-of-the-art models, EfficientDet-Lite detectors are more precise and need fewer computational resources (working memory, power consumption, and floating-point operations per second or FLOPS) than their predecessors.
TFLite is a model optimization toolkit provided by Google, whose purpose is also to reduce the complexity of deep learning models and speed up the inference time.TfLite's key feature is size reduction, which results in a smaller model and less need for storage space and RAM.TFLite decreases latency by quantization, which helps to simplify calculations during inference with only a little amount of accuracy loss.
TFLite optimizes the model via quantization, clustering, and pruning.The model's accuracy is reduced, as it is quantized from FP64 to lower resolutions (FP32, FP16, INT8).In order to reduce a model's complexity by reducing the number of its unique weights, clustering involves dividing the weights of the trained model's layers into clusters and sharing the centroid of each cluster's weight.Pruning is the process of removing less important model parameters that barely affect the outcomes of predictions.This aids in simplifying the model at the expense of some accuracy loss and offers far higher performance.
EfficientDet uses the same backbone as EfficientNet and adds a bi-directional feature pyramid network (BiFPN) to help in multi-scale feature fusion.It utilizes several optimization and backbone tweaks and a compound scaling method that uniformly scales the resolution, depth and width for all backbones, feature networks and box/class prediction networks at the same time.
Considerable research has already been completed for object detection using AI, laying the foundation for this work.The contribution of every work published previously in all the relevant domains has played a significant role in developing this work.Our pick-and-place approach is similar to a robot collecting a rock from a hopper using a vision-based image processing algorithm [7] and a robot performing pick and place operations on deformable items [8].
Similar to us, some researchers developed novel methods to improve the mean Average Precision.Using Faster-RCNN, Leung [1] developed a vehicle detection approach for insufficient and night-time illumination conditions and improved the mAP values by 0.2; however, it was not meant for a resource-constrained embedded system.Luo [9] achieved an average detection accuracy of 57.51% on EfficientDet-D2 for the road damage detection, which was lower than our 74.1% AP for the Alpha1 dataset.Jain [10] developed "DeepSeaNet" to detect underwater objects with EfficientDet with a high accuracy of 98.63%, but the method was not suitable for lightweight devices due to the complexity of the model.Similar to our lightweight methodology, Cirjak [11] used EfficientDet-4 on Raspberry Pi to monitor the codling moth population and achieve a high accuracy of 99% with a small dataset of 430 images.Wu [12] adopted EfficientDet to detect textureless objects in an industrial environment, but this method was not suitable for Raspberry Pi due to the heavy computation.Saurabh [13] used a web camera and ABB robot to detect color and perform a color-sorting algorithm; this work has 100% detection but no machine learning implementation.Konaite [14] used another lightweight model, SSD MobileNet V2, on Raspberry Pi to detect barriers for blind people to navigate safely.
EfficientDet is used in non-industrial scenarios such as dental application.Bayaran [15] assessed the diagnostic quality of bitewing radiographs at contact areas between teeth, which can help the oral radiologists provide better radiographic qualities.
Other than EfficientDet, advanced CNN is used in other applications.For example, it is used in implementing traffic signs recognition in a mobile-based application [16], a speech recognition system using TensorFlow [17], a smart surveillance system for night low-illumination objects [18] and a visual feedback algorithm on AlexNet [19].
The effectiveness of illumination to improve the detection scores has not been extensively studied for pick-and-place solutions, especially robots used in warehouses and logistics areas with low illumination, and thus, our research aims to fill this gap.
We implement a smart and lightweight object detector requiring minimal computational resources which controls a pick-and-place system.The illumination effect on the EfficientDet-Lite model deployed in Raspberry Pi are investigated and compared.Our contributions include the following:

•
The addition of 8% optimized bright Alpha3 images resulted in a 7.5% increase in Average Precision and a 6.3% increase in F1-score.

•
Obtain high detection scores over 80% and low variance of 1.65 by using 135-degree angle and level 0 illumination in accordance with Japanese Industrial Standard (JIS).

•
In-depth analysis of EfficientDet-Lite models with training batch sizes 4, 8, and 16.Batch size 4 had the best performance with an overall mean of 66.8% and low standard deviation of 6.23% The remainder of this paper is organized as follows: In Section 2, we introduce the project setup for a smart and lean pick-and-place solution as well as the data collection and training process.In Section 3, we test our proposed method on the custom dataset and compare it with some other state-of-the-art methods to show the effectiveness and advantages of our method.Conclusions are drawn in Section 4.

Materials and Measurements Setup
Figure 1 shows the project setup for our smart and lean pick-and-place solution.The framework of the pick-and-place solution is mainly divided into two parts; object detection and location using an embedded system, and workpiece placement using a robot arm.A Universal Robot 3 (UR3) collaborative robot is used to perform the pick-and-place solution.A Logitech 2D camera of 5 Megapixels (MP)is mounted on the arm of UR3 and connected to the USB port of a Raspberry Pi.Once the object is detected, the General-Purpose Input-Output (GPIO) sends a signal to control the hardwired signal of the UR, as proposed by our previous work [5].Using 2D cameras is beneficial to the pick-and-place system, as they are more affordable and have good accuracy [20].Compared to other vision solutions, a 2D camera detection engine is easy to use, thus increasing their adaptability and flexibility to different custom-made objects.
Figure 2 shows the flowchart for our pick-and-place solution using the Raspberry Pi.Using OpenCV on Raspberry Pi, the data images are captured, and the images are annotated using the online Roboflow tool.The dataset is then preprocessed, augmented, and run on Google Colab for model building and training.Using TensorFlow's Model Maker [21], the model is built, trained, and evaluated before being converted to TensorFlow Lite.The lite version consumes less memory compared to the original version.Figure 2 shows the flowchart for our pick-and-place solution using the Raspberry Pi.Using OpenCV on Raspberry Pi, the data images are captured, and the images are annotated using the online Roboflow tool.The dataset is then preprocessed, augmented, and run on Google Colab for model building and training.Using TensorFlow's Model Maker [21], the model is built, trained, and evaluated before being converted to TensorFlow Lite.The lite version consumes less memory compared to the original version.This project used 2 datasets for training: Dataset 1 for optimized bright and illumination level and Dataset 2 for batch test.The mean and variance AP are then extracted and compared.For the batch test, we compare the AP for batch sizes 4, 8 and 16.For the illumination test, we use 3 lamp angles and compare the best detection scores.
The project uses custom datasets for the initial training process.Three different colors (yellow, blue, red) are used, and each color has 2 different shapes (cylinder, cube), as shown in Figure 3.The project uses Roboflow [22] as an online annotation and data splitting tool.Using the Label Assist tool, the highest mAP is used to annotate images, and the  Figure 2 shows the flowchart for our pick-and-place solution using the Raspb Using OpenCV on Raspberry Pi, the data images are captured, and the images a tated using the online Roboflow tool.The dataset is then preprocessed, augmen run on Google Colab for model building and training.Using TensorFlow's Mode [21], the model is built, trained, and evaluated before being converted to TensorFl The lite version consumes less memory compared to the original version.The project uses custom datasets for the initial training process.Three differen (yellow, blue, red) are used, and each color has 2 different shapes (cylinder, c shown in Figure 3.The project uses Roboflow [22] as an online annotation and da ting tool.Using the Label Assist tool, the highest mAP is used to annotate images, confidence is lowered to 20%.The confidence level is lowered to make the annota ible for all objects.The overlap is reduced to 50% to enable detection of workpie poor confidence.These values are chosen after trial and error in order for the A add most of the annotations automatically.For faster annotation, the zoom and lo functions are used to ensure more accurate annotation as the workpiece looks b The project uses custom datasets for the initial training process.Three different colors (yellow, blue, red) are used, and each color has 2 different shapes (cylinder, cube), as shown in Figure 3.The project uses Roboflow [22] as an online annotation and data splitting tool.Using the Label Assist tool, the highest mAP is used to annotate images, and the confidence is lowered to 20%.The confidence level is lowered to make the annotation visible for all objects.The overlap is reduced to 50% to enable detection of workpiece with poor confidence.These values are chosen after trial and error in order for the AI tool to add most of the annotations automatically.For faster annotation, the zoom and lock view functions are used to ensure more accurate annotation as the workpiece looks bigger.A Roboflow's Healthcheck is performed to ensure the distribution of workpieces in the dataset.Roboflow's Healthcheck is performed to ensure the distribution of workpieces in the da taset.According to our previously published research [5], EfficientDet-Lite4 has a large fil size and may not be suitable for Raspberry Pi due to its memory limitations.Hence, in thi project, we specifically use EfficientDet-Lite 0 to 3 as our models, and these four architec tures are trained using the TensorFlow Lite Model Maker [23] library.
All Tensorflow models are developed and evaluated using Google Colab [23] with Graphic Processing Unit (GPU) Hardware Accelerator.The proposed framework aims t be a complete object detector that only needs minimal data processing before and afte model detection and classification.After training, the EfficientDet Lite models are quan tized (weights of 8-bit integer) using Post-Training Quantization.
For our object detection, the evaluation criteria are the mean Average Precisio (mAP) and F1-score.According to the validation dataset COCO2017, its mAP is the sam as Average Precision in Tensorflow Model Maker [23].For our pick-and-place application ARmax10 is chosen as the Recall value as we expect to have a maximum of 10 detection per pick-and-place application.
The formula of the mean Average Precision is given as below: where APk is the Average Precision of class k, and n is the number of classes.The F1-score is used to evaluate the models' accuracy, since it allows for the simulta neous maximization of two metrics that are well known in this field: Precision, whic measures the detections of objects, and Recall, which measures the objects that are de tected.The F1-score is calculated based in the mAP and Recall value in the formula below To speed up the training, the number of epochs is fixed at 50 for the 4 models.Th threshold for detection is set at 30%.The standard deviation of the AP is calculated a follows: where  is the population standard deviation, N is the size of the population, xi is th value of an AP and µ is the population mean of all APs.
The variance of the AP is calculated as follows: According to our previously published research [5], EfficientDet-Lite4 has a large file size and may not be suitable for Raspberry Pi due to its memory limitations.Hence, in this project, we specifically use EfficientDet-Lite 0 to 3 as our models, and these four architectures are trained using the TensorFlow Lite Model Maker [23] library.
All Tensorflow models are developed and evaluated using Google Colab [23] with a Graphic Processing Unit (GPU) Hardware Accelerator.The proposed framework aims to be a complete object detector that only needs minimal data processing before and after model detection and classification.After training, the EfficientDet Lite models are quantized (weights of 8-bit integer) using Post-Training Quantization.
For our object detection, the evaluation criteria are the mean Average Precision (mAP) and F1-score.According to the validation dataset COCO2017, its mAP is the same as Average Precision in Tensorflow Model Maker [23].For our pick-and-place application, ARmax10 is chosen as the Recall value as we expect to have a maximum of 10 detections per pick-and-place application.
The formula of the mean Average Precision is given as below: where AP k is the Average Precision of class k, and n is the number of classes.The F1-score is used to evaluate the models' accuracy, since it allows for the simultaneous maximization of two metrics that are well known in this field: Precision, which measures the detections of objects, and Recall, which measures the objects that are detected.The F1-score is calculated based in the mAP and Recall value in the formula below: To speed up the training, the number of epochs is fixed at 50 for the 4 models.The threshold for detection is set at 30%.The standard deviation of the AP is calculated as follows: where σ is the population standard deviation, N is the size of the population, x i is the value of an AP and µ is the population mean of all APs.
The variance of the AP is calculated as follows: where S 2 is the population variance, N is the size of the population, x i is the value of an AP and µ is the population mean of all APs.Data preprocessing decreases the training time and increases the performance of the AI model by applying image transformations to all images in this dataset, as shown in Table 1.
To prevent overfitting of the model, the data augmentation process is shown in Table 2.Both data preprocessing and augmentation are completed using the online Roboflow tool.This project uses an augmentation technique to increase the number of images from a small number of images.Two datasets were built: Dataset 1 with 124 images to study the illumination effect and Dataset 2 with 82 images to study effect of batch size.Dataset 2 is a small dataset consisting of 82 images, and it is used for batch testing.Our objective is to determine whether there is a visible increase in mAP due to batch size despite the small datasets.Table 3 presents the two datasets utilized for the three test trials.For test experiments 1 and 2, the model chosen is EfficientDet-Lite2 with a batch size of 8, as it is lightweight and has a good trade-off between training speed and accuracy.For test experiment 3, we vary the batch size to 4, 8 and 16, and we also vary the 3 Tensorflow models from EfficientDet-Lite 0 to 3.

Image Optimization Process to Improve Mean Average Process
Table 4 shows the distribution of the workpiece in Dataset 1, where there are a total of 963 objects.The blue cylinder has highest distribution at 35.61%, which is followed by the blue cube at 20.87%.The red cylinder has the lowest distribution at 13.33%.The proposed pick-and-place method aims to be a complete object detector that only needs minimal data processing before and after model detection and classification.Therefore, we choose a small number of images to evaluate the effectiveness of this approach.This study emphasizes the necessity of rapid and accurate label annotations, as well as fast data processing with small datasets, using a smart and "lean" approach.
Figure 4 shows the augmentation process, where 124 original images are augmented to 327 images using the Flip-90 degree, Flip-180 degree and Rotate-15 degrees manipulation.Similarly, the 10 bright images (8% of the total images) are augmented to 24 images.Using the Roboflow online tool, the images are merged to produce 1006 augmented image after undergoing image shearing manipulation.The proposed pick-and-place method aims to be a complete object detector that only needs minimal data processing before and after model detection and classification.Therefore, we choose a small number of images to evaluate the effectiveness of this approach.This study emphasizes the necessity of rapid and accurate label annotations, as well as fast data processing with small datasets, using a smart and "lean" approach.
Figure 4 shows the augmentation process, where 124 original images are augmented to 327 images using the Flip-90 degree, Flip-180 degree and Rotate-15 degrees manipulation.Similarly, the 10 bright images (8% of the total images) are augmented to 24 images.Using the Roboflow online tool, the images are merged to produce 1006 augmented image after undergoing image shearing manipulation.For the bright images, 10 images of different alpha values are added to the dataset, as shown in Table 5.Using the addWeighted function from OpenCV, a brightness of alpha(α) parameter is overlayed onto the 10 images.The addWeighted function is a function that helps by adding two images (f0 and f1) by passing varying α values of 1, 2 and 3 into the formula below.The datasets are named as follows.Alpha1 has 124 base images with 10 bright images.Alpha2 has 124 base images with 10 bright images, and Alpha3 has 10 bright images.The optimization is shown in Figure 5, where the results of brightening are clearly visible.For the bright images, 10 images of different alpha values are added to the dataset, as shown in Table 5.Using the addWeighted function from OpenCV, a brightness of alpha(α) parameter is overlayed onto the 10 images.The addWeighted function is a function that helps by adding two images (f0 and f1) by passing varying α values of 1, 2 and 3 into the formula below.The datasets are named as follows.Alpha1 has 124 base images with 10 bright images.Alpha2 has 124 base images with 10 bright images, and Alpha3 has 10 bright images.The optimization is shown in Figure 5, where the results of brightening are clearly visible.

Illumination Level Setup to Improve Detection Scores
According to the Japanese Industrial Standard (JIS) Z 9110-1979 [24], the mended level of illumination shall be as shown in Table 6 below.Using an illum (lux) meter and a commercially available lamp, the appropriate level of illumin determined for a specific work location in a typical factory environment, such as th ing, assembly and inspection area.It is important to note that the detection scores are affected by the surface re since the workpieces are made of metallic materials.Rather than mounting the l rectly above the workpieces, the lamp is positioned on the side so that the light d reflect directly from the metallic surface.As shown in Figure 6, the base of the positioned 30 cm horizontally and 25 cm vertically from the workpieces, with 3 a 180 degrees, 90 degrees and 135 degrees from the lowest vertical point lamp.This focuses on pick-and-place applications within darkrooms, corridors, and poorly nated areas with less than 300 Lux levels, as it is less extensively studied.Table 7 shows the on-site lux measurement as well as classification of areas proposed robot implementation in an assembly line.

Illumination Level Setup to Improve Detection Scores
According to the Japanese Industrial Standard (JIS) Z 9110-1979 [24], the recommended level of illumination shall be as shown in Table 6 below.Using an illuminance (lux) meter and a commercially available lamp, the appropriate level of illumination is determined for a specific work location in a typical factory environment, such as the packing, assembly and inspection area.It is important to note that the detection scores are affected by the surface reflection since the workpieces are made of metallic materials.Rather than mounting the lamp directly above the workpieces, the lamp is positioned on the side so that the light does not reflect directly from the metallic surface.As shown in Figure 6, the base of the lamp is positioned 30 cm horizontally and 25 cm vertically from the workpieces, with 3 angles of 180 degrees, 90 degrees and 135 degrees from the lowest vertical point lamp.This project focuses on pick-and-place applications within darkrooms, corridors, and poorly illuminated areas with less than 300 Lux levels, as it is less extensively studied.
Table 7 shows the on-site lux measurement as well as classification of areas for the proposed robot implementation in an assembly line.

Illumination Level Setup to Improve Detection Scores
According to the Japanese Industrial Standard (JIS) Z 9110-1979 [24], the mended level of illumination shall be as shown in Table 6 below.Using an illum (lux) meter and a commercially available lamp, the appropriate level of illumin determined for a specific work location in a typical factory environment, such as th ing, assembly and inspection area.It is important to note that the detection scores are affected by the surface re since the workpieces are made of metallic materials.Rather than mounting the la rectly above the workpieces, the lamp is positioned on the side so that the light d reflect directly from the metallic surface.As shown in Figure 6, the base of the l positioned 30 cm horizontally and 25 cm vertically from the workpieces, with 3 an 180 degrees, 90 degrees and 135 degrees from the lowest vertical point lamp.This focuses on pick-and-place applications within darkrooms, corridors, and poorly nated areas with less than 300 Lux levels, as it is less extensively studied.Table 7 shows the on-site lux measurement as well as classification of areas proposed robot implementation in an assembly line.To ensure reliability and determine the average detection score, a minimum of three readings are taken.Figure 7 illustrates the setup of three different lamp angles (180, 135 and 90).A lamp angle has two adjustable levels of illumination-lower than 150 lux and between 150 and 300 lux.To ensure reliability and determine the average detection score, a minimum of three readings are taken.Figure 7 illustrates the setup of three different lamp angles (180, 135 and 90).A lamp angle has two adjustable levels of illumination-lower than 150 lux and between 150 and 300 lux.To prevent the direct reflection of the light by the metallic surface, the lamp is mounted at the side instead of directly above the workpieces.The reason for this setup is to reduce the direct glare from the lamp while providing enough illumination, as shown in Figure 8.The number of frames per second (FPSs) can be seen on the top-left corner of the image.The speed is typically in the range of 0.8 frames per second as the Python code used to display the results using OpenCV uses a portion of the processing time.In this project, a batch size of 4 is abbreviated to "Batch4", a batch size of 8 is abbreviated to "Batch8", and a batch size of 16 is abbreviated to "Batch16".To prevent the direct reflection of the light by the metallic surface, the lamp is mounted at the side instead of directly above the workpieces.The reason for this setup is to reduce the direct glare from the lamp while providing enough illumination, as shown in Figure 8.The number of frames per second (FPSs) can be seen on the top-left corner of the image.The speed is typically in the range of 0.8 frames per second as the Python code used to display the results using OpenCV uses a portion of the processing time.To ensure reliability and determine the average detection score, a minimum of three readings are taken.Figure 7 illustrates the setup of three different lamp angles (180, 135 and 90).A lamp angle has two adjustable levels of illumination-lower than 150 lux and between 150 and 300 lux.To prevent the direct reflection of the light by the metallic surface, the lamp is mounted at the side instead of directly above the workpieces.The reason for this setup is to reduce the direct glare from the lamp while providing enough illumination, as shown in Figure 8.The number of frames per second (FPSs) can be seen on the top-left corner of the image.The speed is typically in the range of 0.8 frames per second as the Python code used to display the results using OpenCV uses a portion of the processing time.In this project, a batch size of 4 is abbreviated to "Batch4", a batch size of 8 is abbreviated to "Batch8", and a batch size of 16 is abbreviated to "Batch16".In this project, a batch size of 4 is abbreviated to "Batch4", a batch size of 8 is abbreviated to "Batch8", and a batch size of 16 is abbreviated to "Batch16".The model training and validation are completed in Google Colab (Python 3.8) using Tensorflow Model Maker on free GPU.The Tensorflow model is trained with training data with batch sizes of 4, 8 and 16, respectively.To speed up the training, the number of epochs is fixed at 50 for the 4 models.

Training Batch Size Configuration to Improve Mean Average Precision
Dataset 2 is used for the batch size training; it is deliberately chosen to be a small dataset consisting of 82 images and 1022 objects.As shown in Table 9, it has a healthy distribution percentage of the workpieces.The small dataset allows fast training times and practical data preparation.Its purpose is to determine an optimal number of batch sizes with a good trade-off between the accuracy of the detected objects and training speed.

Results of Optimized Bright Images on Average Precision
Table 10 shows the results of the evaluation metrics using EfficientDet-Lite2, and training was completed for 50 epochs.The results demonstrate that Dataset 3 has the highest F1-score, which is to be expected given that it has the highest AP and AR Max10.The best performance comes from Alpha3, as it results in a 10.2% increase in Average Precision and a 6.3% increase in F1-score as compared to the Control Dataset.The color of the workpiece has a direct effect on the accuracy, as shown in Table 11.The blue cube performs the best under Alpha3 with an Average Precision increase of 24.6% while the red cube performs the worst with the AP dropping by 28.6% in the Alpha2 dataset and further to less than the threshold value of 30% in the Alpha3 dataset.One explanation could be that red colors reflect more light, whilst blue colors absorb more.The blue cube has the highest AP (90.4%) in the Alpha3 dataset, while the red cube has the lowest AP in the Alpha2 (45%) and in Alpha3 datasets (29%).As our goal here is to have consistent high detection scores with low variance, we choose Alpha1 instead of Alpha3 for our subsequent projects.The average AP for Alpha1 is 0.8% higher than the control dataset.Hence, Alpha1 is chosen to increase the AP while maintaining low variance.Figure 9 shows that in general, all the APs for Alpha1 and Alpha2 are higher than that of the COCO2017 dataset, which is 33.97%.For Alpha3, only the red cube has a lower AP than the validation mAP.The overall mean of Alpha1 is 6.2% higher than that of Alpha2.Therefore, for a high and consistent object detection approach, we use the Alpha1 dataset in our pick-and-place solution.16.4 0.47 1.9

Average Precision
Figure 9 shows that in general, all the APs for Alpha1 and that of the COCO2017 dataset, which is 33.97%.For Alpha3, only AP than the validation mAP.The overall mean of Alpha1 is 6.2% pha2.Therefore, for a high and consistent object detection appro dataset in our pick-and-place solution.

Results of Illumination Level on Detection Scores
Table 12 shows the results of various illumination angles an taken from the three best detection scores for all the three differe and 90).Each angle has two levels of illumination-a lux level less in between 150 and 300.The lux measurements are taken to ensu curs during the specified illumination level, as per the JIS specific detection is set at 30% and, hence, if there is any detection less th as 'Nil' below.

Results of Illumination Level on Detection Scores
Table 12 shows the results of various illumination angles and levels.The average is taken from the three best detection scores for all the three different lamp angles (180, 135 and 90).Each angle has two levels of illumination-a lux level less than 150 and a lux level in between 150 and 300.The lux measurements are taken to ensure that the detection occurs during the specified illumination level, as per the JIS specification.The threshold for detection is set at 30% and, hence, if there is any detection less than 30%, it will be shown as 'Nil' below.
Table 13 shows the average detection score of workpieces and the variance.The 135-degree level 1 has the highest average detection score (81.17%), which is followed by the 135-degree level 0 at 80.50%.The 180-degree level 1 has the lowest detection scores of 77.94%.The lowest variance is 1.65% at the 135-degree level, and the highest variance is 25.08% for the 90-degree level 1.Hence, to increase the average detection score to over 80%, as shown in Figure 10 below, we propose to use a 135-degree lamp.We can observe that the illumination distribution is more even and that there is less surface reflection with a 135-degree lamp of level 0 illumination because of its low variance.
distribution is more even and that there is less surface reflection with a 135-degree lamp of level 0 illumination because of its low variance.

Results of Variation of Batch Size on Average Precision
A comparison of Average Precision across Efficient-Det Lite models is built in order to investigate which model performs the best with a small dataset.As shown in Table 14 below, Batch4 has the highest overall AP at 66.8%, and it has the lowest standard deviation at 6.23%.In contrast, Batch16 has the lowest overall AP at 57.4% and highest standard deviation at 7.87%.According to Keskar [25], this is because the large-batch methods tend to converge to sharp minimizers of the training and testing function and as is well known, sharp minima lead to poorer generalization.In contrast, small-batch methods consistently converge to flat minimizers due to the inherent noise in the gradient estimation.As compared to Batch16, both Batch4 and Batch8 have higher average APs of 66.1 ± 0.7% and standard deviations of 6.245 ± 0.015%.

Results of Variation of Batch Size on Average Precision
A comparison of Average Precision across Efficient-Det Lite models is built in order to investigate which model performs the best with a small dataset.As shown in Table 14 below, Batch4 has the highest overall AP at 66.8%, and it has the lowest standard deviation at 6.23%.In contrast, Batch16 has the lowest overall AP at 57.4% and highest standard deviation at 7.87%.According to Keskar [25], this is because the large-batch methods tend to converge to sharp minimizers of the training and testing function and as is well known, sharp minima lead to poorer generalization.In contrast, small-batch methods consistently converge to flat minimizers due to the inherent noise in the gradient estimation.As compared to Batch16, both Batch4 and Batch8 have higher average APs of 66.1 ± 0.7% and standard deviations of 6.245 ± 0.015%.From Table 15 and Figure 11 below, the overall results show that the red cube has the highest AP at 69.9% and the second lowest standard deviation at 4.48%.The red cylinder has the second highest AP at 62.4% with a standard deviation at 5.09%.In comparison to other classes, the yellow cylinder has the lowest AP at 53.8% and the highest standard deviation at 8.69%.This is consistent with our prior findings [26], which show that neutralcolored things such as yellow cubes have lower APs than strong-colored objects such as red cubes.As the objects for this pick-and-place action have a metallic surface, yellow objects suffer from surface reflection from external lighting.From Table 15 and Figure 11 below, the overall results show that the red cube has the highest AP at 69.9% and the second lowest standard deviation at 4.48%.The red cylinder has the second highest AP at 62.4% with a standard deviation at 5.09%.In comparison to other classes, the yellow cylinder has the lowest AP at 53.8% and the highest standard deviation at 8.69%.This is consistent with our prior findings [26], which show that neutralcolored things such as yellow cubes have lower APs than strong-colored objects such as red cubes.As the objects for this pick-and-place action have a metallic surface, yellow objects suffer from surface reflection from external lighting.

Statistical Analysis on Variation of Batch Size
As we observed in Table 14 above, the Average APs of Batch4 and Batch8 are quite similar.Therefore, a statistical analysis is conducted to see whether there is a significant difference between the two groups of values.We utilize the Mann-Whitney U method as

Statistical Analysis on Variation of Batch Size
As we observed in Table 14 above, the Average APs of Batch4 and Batch8 are quite similar.Therefore, a statistical analysis is conducted to see whether there is a significant difference between the two groups of values.We utilize the Mann-Whitney U method as it is one of the most commonly used non-parametric statistical tests [27].Developed by Mann and Whitney in 1947, this non-parametric test is frequently used for small samples of data that are not normally distributed [28].
In Mann-Whitney U test, the null hypothesis states that the medians of the two respective groups are not different.As for the alternative hypothesis, it states that one median is larger than the other or that the two medians differ.If the null hypothesis is not rejected, it means that the median of each group of observations is similar.If the null hypothesis is rejected, it means the two medians differ.
We apply the Mann-Whitney U test to our Batch4 and Batch8 as the number of samples are small [29], less than 30, and the AP results are not normally distributed.Our null hypothesis (H0) and alternative hypothesis (H1) follow: H0.The median of APs is equal between Batch4 and Batch8 APs.
H1.The median of APs is not equal between Batch4 and Batch8 APs.
Using SciPy which is a Python library used for scientific computing and technical computing, we obtain a p-value of 0.448.Since the p-value (0.448) is above the 0.05 significance level, we fail to reject the null hypothesis.
We conclude there is not enough evidence to suggest a significant difference in medians between the two datasets.As the standard deviation only differs by 0.008, we recommend using Batch8 instead of Batch4 for faster training and evaluation.

Performance Validation
We use the pre-trained model SSD MobileNet V2 FPNLite and compared the performance regarding the Average Precision and detection score.Table 16 shows the performance results of our selected EfficientDet-Lite2 model and SSD MobileNet V2 FPNLite, using the same training data for each of them.We compare the performance of the SSD MobileNet to that of our EfficientDet-Lite2 model with the Alpha1 dataset and found that our results were significantly better.Compared to SSD MobileNet V2 FPNLite, our model shows a significant improvement of accuracy for the yellow cube at 40.4% and red cube at 40.1%.Table 17 shows the comparison of detection scores for the custom dataset running on our EfficientDet-Lite2 model and SSD MobileNet V2 FPNLite.The detection scores are taken with the project setup of the 135-degree lamp and level 0 illumination.When compared to SSD MobileNet V2, our model significantly improves the overall detection scores by 5.19%.Comparing EfficientDet-Lite2 to SSD MobileNet V2, significant improvements are observed for the red cube (+19.33%) and red cylinder (+20.66%).objects and controlling robot arm movement.This project has improved the detection of objects with deep learning and implemented the state-of-the-art (SOTA) EfficientDet-Lite model on Raspberry Pi.The validation on another SOTA model, SSD MobileNet V2 FPN-Lite, has shown that our object detection significantly improves the Average Precision and overall detection scores.Our experimental results clearly show that the proposed method has an increase of 5.19% in AP compared to the SSD model.In fact, all the APs obtained during the evaluation test exceed the AP from the COCO2017 validation test.
In the EfficientDet-Lite2 model, the addition of 8% optimized bright Alpha3 images resulted in an increase of 7.5% in Average Precision and an increase of 6.3% in F1-score.We observed that the blue cube has the highest AP in the Alpha3 dataset, while the red cube has the lowest AP in the Alpha2 and Alpha3 datasets.As our goal here is to have a consistent high detection score with low variance, we choose Alpha1 instead of Alpha3 for our subsequent projects.
As a result of using Batch4, the overall AP across all EfficientDet-Lite models increased significantly to 66.8% as compared to Batch8 at 65.4% and Batch16 at 57.4%.In order to speed up the training for our subsequent projects, we use Batch8 for training rather than Batch4.This decision is validated by the Mann-Whitney U statistical analysis, which shows that Batch4 and Batch8 do not have any significant differences in AP.
In order to increase the overall detection score to over 80%, we utilized a 135-degree lamp and level 0 illumination.As such, this study presents a very good starting point for the development of a better object detection for pick-and-place robots by using the effect of illumination.This is important because the control algorithm [4] that Raspberry Pi uses to move the arm of the universal robot depends on high and consistent detection scores to establish the location of the workpiece.In the future, we will extend the system to identify angular rotations and improve real-time detection as well as address the issue of surface reflection from external lights.

Figure 1 .
Figure 1.Project setup of our smart and lean pick-and-place solution.

Figure 2 .
Figure 2. Flowchart with dataset, batch size and lamp control.

Figure 1 .
Figure 1.Project setup of our smart and lean pick-and-place solution.

Figure 1 .
Figure 1.Project setup of our smart and lean pick-and-place solution.

Figure 2 .
Figure 2. Flowchart with dataset, batch size and lamp control.

This project used 2
datasets for training: Dataset 1 for optimized bright and nation level and Dataset 2 for batch test.The mean and variance AP are then extrac compared.For the batch test, we compare the AP for batch sizes 4, 8 and 16.For mination test, we use 3 lamp angles and compare the best detection scores.

Figure 2 .
Figure 2. Flowchart with dataset, batch size and lamp control.This project used 2 datasets for training: Dataset 1 for optimized bright and illumination level and Dataset 2 for batch test.The mean and variance AP are then extracted and compared.For the batch test, we compare the AP for batch sizes 4, 8 and 16.For the illumination test, we use 3 lamp angles and compare the best detection scores.The project uses custom datasets for the initial training process.Three different colors (yellow, blue, red) are used, and each color has 2 different shapes (cylinder, cube), as shown in Figure3.The project uses Roboflow[22] as an online annotation and data splitting tool.Using the Label Assist tool, the highest mAP is used to annotate images, and the confidence is lowered to 20%.The confidence level is lowered to make the annotation visible for all objects.The overlap is reduced to 50% to enable detection of workpiece with poor confidence.These values are chosen after trial and error in order for the AI tool to add most of the annotations automatically.For faster annotation, the zoom and lock view functions are used to ensure more accurate annotation as the workpiece looks bigger.A Roboflow's Healthcheck is performed to ensure the distribution of workpieces in the dataset.

Figure 8 .
Figure 8. Best detection scores at different illumination levels.

Figure 8 .
Figure 8. Best detection scores at different illumination levels.

Table 8
below defines their training settings.The batch size is set based on the GPU Random Access Memory (RAM) with the default learning rate and input resolution.The Tensorflow model is trained with training data with batch sizes of 4, 8 and 16, respectively

Figure 8 .
Figure 8. Best detection scores at different illumination levels.

Table 8
below defines their training settings.The batch size is set based on the GPU Random Access Memory (RAM) with the default learning rate and input resolution.The Tensorflow model is trained with training data with batch sizes of 4, 8 and 16, respectively.

Figure 10 .
Figure 10.Bar graph of detection scores with average and variance.

Figure 10 .
Figure 10.Bar graph of detection scores with average and variance.

Table 1 .
Preprocessing operations applied to the original images.

Table 2 .
Augmentation operations applied to the original images.

Table 3 .
Augmented dataset for illumination, optimized brightness and batch size test.

Table 4 .
Distribution of workpiece in Dataset 1.

Table 4 .
Distribution of workpiece in Dataset 1.

Table 5 .
Optimization process with Alpha datasets.

Table 5 .
Optimization process with Alpha datasets.

Table 6 .
JIS recommend level of illumination.

Table 6 .
JIS recommend level of illumination.

Table 6 .
JIS recommend level of illumination.

Table 7 .
Classification of work areas.

Table 7 .
Classification of work areas.

Table 8 below
defines their training settings.The batch size is set based on the GPU Random Access Memory (RAM) with the default learning rate and input resolution.The Tensorflow model is trained with training data with batch sizes of 4, 8 and 16, respectively.

Table 7 .
Classification of work areas.

Table 9 .
Equal distribution of workpiece in Dataset 2.

Table 10 .
The Average Precision and calculation of F1-score.

Table 11 .
Average Precision with Alpha datasets.

Table 12 .
Detection scores and lux level for the different angles and inte

Table 12 .
Detection scores and lux level for the different angles and intensities of the lamps.

Table 13 .
Detection scores and Lux level for 180-degree level 1 lamp.

Table 14 .
Variation of batch size.

Table 14 .
Variation of batch size.

Table 15 .
Average Precision according to workpieces.

Table 15 .
Average Precision according to workpieces.

Table 16 .
Comparison of APs for custom object detection model.

Table 17 .
Comparison of detection scores with SSD MobileNet V2.