You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

29 July 2023

MoBiSea: A Binary Search Algorithm for Product Clustering in Industry 4.0

,
,
,
and
1
Department of Computer Science and System Engineering, University of Zaragoza, 50018 Zaragoza, Spain
2
Departament of Computer Engineering (DISCA), Universitat Politècnica de València, 46022 Valencia, Spain
*
Author to whom correspondence should be addressed.

Abstract

Proprietary systems used to modernize Industry 4.0 usually involve high financial costs. Consequently, using low-cost devices with the same functionalities, capable of replacing these proprietary systems but at a lower cost, has become an incipient trend. However, these low-cost devices usually come with electromagnetic interference problems as they are encapsulated in electrical panels, sitting alongside electromechanical devices. In this article, we present Mode Binary Search, an algorithm specifically designed for use in a low-cost automated-industrial-productivity-data-collection system. Specifically, productivity data are obtained from the availability and sealing signals of the thermoplastic sealing machines in production lines belonging to the agri-food industry. Mode Binary Search was designed to cluster sealing signals, thus enabling us to identify which products have been made. Furthermore, the algorithm determines when the manufacturing of each product starts and ends, in other words, the exact moment a product change occurs and all this without the need for operator supervision or intervention. Finally, we compared our algorithm, based on binary search, with three clustering mechanisms: k-means, k-rms and x-means. Out of all the cases we analyzed, the maximum error committed by Mode Binary Search is limited to 2.69%, thereby outperforming all others.

1. Introduction

The concept of Industry 4.0 aims to integrate machinery, devices and sensors, in other words, the physical manufacturing process, with digital parts and advanced software []. All this is driven by modern connected industry technologies used to predict, control, maintain and integrate manufacturing processes. As a result, their impact is expected to be far-reaching in future manufacturing systems and, therefore, organizations will need to increase their investments in digital technologies [].
Although there is a wide range of devices and equipment capable of modernizing the industry, the proprietary systems used tend to involve a high financial investment. In most cases, they are encapsulated systems, implementing them is expensive and their communication protocols differ. Consequently, several proposals have analyzed the possibility of adopting low-cost devices in industrial environments. Although they have not yet been installed in real settings on a large scale, their usage is growing despite the difficulties encountered, such as strong Electromagnetic Interferences (EMIs), which are prone to cause both sensing and communications errors.
As a contribution towards increased industry automation through the adoption of low-cost systems and to demonstrate that systems requiring less financial investment may, nevertheless, be capable of performing the same operations as other proprietary systems, we have developed a solution that measures and monitors the variables involved in calculating the Overall Equipment Effectiveness (OEE) factor []. The aim is to obtain precise production parameters (availability, performance and quality) in real time in industrial settings.
OEE is a measurement tool based on the Total Productive Maintenance (TPM) concept. Its purpose is to detect and eliminate failures caused by equipment, thereby improving production, reducing costs and inventory, and increasing labor productivity. Measuring OEE enables us to identify reasons and quantify their effect in terms of poor performance, thus providing us with improvement rates and an analysis of the source of the defects. Figure 1 shows the components taken into account for the calculation of OEE: (i) the availability metric evaluates time ratio when the machine is genuinely ready to operate, (ii) the performance metric quantifies the share of time the machine can work, disregarding time losses stemming from reduced speed and, lastly, (iii) the quality metric calculates the fraction of time the machine operates at full productivity, accounting for time losses caused by defects.
Figure 1. Overall-equipment-effectiveness components.
The automatic collection of productivity data is a crucial component for maximizing efficiency, predicting and preventing problems and advancing towards greater automation in the era of Industry 4.0. It offers tangible advantages, such as real-time monitoring of machinery performance, enabling companies to promptly identify any system anomalies or failures. This, in turn, contributes to minimizing downtime and optimizing operational efficiency. Additionally, the collected data serve as a foundation for conducting predictive analysis, enabling the prediction of future trends or issues before they arise. As a result, proactive decision-making based on data is facilitated. Furthermore, these data play a vital role in the implementation of cyber–physical systems and the Internet of Things (IoT), as they provide the necessary information for automating and controlling production processes in real time.
It is important to emphasize that our work is immersed in the context of Industry 4.0 and the IoT. Within the framework of Industry 4.0, which seeks to integrate machinery, devices and sensors to achieve an effective digitalization of production processes, our algorithm makes a significant contribution. It provides a low-cost solution for measuring and monitoring the variables involved in the calculation of the OEE factor, a key element to achieve greater industrial automation. In addition, thanks to the automatic collection of productivity data in real time, our algorithm facilitates the implementation of cyber–physical systems and IoT, two fundamental pillars of Industry 4.0. On the other hand, our algorithm is used for the automation of electromagnetic-interference-filtering mechanisms, a recurring and significant problem in industrial environments, especially when using low-cost devices. We designed our algorithm to effectively eliminate erroneous signals caused by these interferences, thus improving the precision and efficiency of the system. This aspect is crucial as effective management of electromagnetic interference is vital to ensure the integrity of the collected data and, ultimately, the effectiveness of the production systems based on Industry 4.0 and IoT.
Our previous research focused on measuring availability and performance by measuring sealing signals in thermoplastic sealing machines in the agri-food industry []. Figure 2 presents a scheme of our low-cost system previously designed to determine the mentioned OEE, in particular the availability and performance factors. In this scheme, five parts can be observed: the industrial sealer, which performs the sealing of the containers; the relays that receive the electrical signals from the availability and sealing; the Raspberry Pi, a low-cost microcomputer programmed to receive the signals and to manage the database where EMI filtering mechanisms are applied; the database, where the signals from each line are stored; and finally, the OEE dashboard, responsible for monitoring the process.
Figure 2. Scheme of the low-cost OEE estimation system [].
During the design and development process of our low-cost system, based on the Raspberry Pi platform, we encountered significant EMI problems in the signals we gathered; this is because low-cost devices are placed alongside other electromechanical elements. We solved this problem by eliminating the wrong signals due to EMI using two filtering mechanisms that were designed to detect and eliminate all wrong signals due to EMI [], thereby avoiding any additional equipment. Specifically, these mechanisms make it possible to eliminate noise in two types of signals from thermosealing machines: sealing signals and availability signals. The Database Filter (DBF) is responsible for filtering the sealing signals, while the Smart Coded Filter (SCF) is responsible for filtering the availability signals of the machine.
The proposal presented in this article focuses on the DBF filtering mechanism, which enables us to determine valid sealing signals for a correct OEE calculation. This mechanism first filters signals lasting less than one second and then discards wrong signals that do not fit into a logical order of sealing signal values (in other words, a 1 and a 0). After these two operations, the system needs to know the start and end time instants for each product. Operators previously had to enter these values manually, which is cumbersome and error-prone. That is why we aim to design an algorithm capable of automatically estimating the manufactured products, as well as the sealing times for each product, without any operator supervision. We have called this algorithm MoBiSea (Mode Binary Search).
MoBiSea clusters sealing times to automatically identify products and also how many product types are involved in the industrial process. In addition, it determines the moment when the process starts and ends for each of the products. In our proposal, we validate MoBiSea by clustering the sealing-time values of various products in an agri-food industry and, more specifically, those involved in the sealing lines of a cheese factory. The products the algorithm must select and cluster are those involved in the manufacturing process; in other words, all the products in the manufacturing process in a shift, at different times and in each shift without distinction. MoBiSea provides the automation needed for our low-cost system based on Raspberry Pi. It mainly measures OEE, although it can be applied to other systems.
We compared MoBiSea against the k-means, k-rms and x-means algorithms to evaluate its performance and validate the proposal. Specifically, we analyzed the number of signals categorized in every cluster, the start and end of each cluster and detected the products made in each shift. We also compared real clusters (i.e., the products that have been manufactured) with those our algorithm detected, as well as the clusters estimated via the x-means algorithm, which is a version of k-means that enables us to determine the optimal number of clusters. The results, which we present in Section 4, show that the number of clusters estimated via x-means is not correct in most of the cases we analyzed. Furthermore, we show that k-means, k-rms and x-means cannot correctly estimate the number of signals in each of the clusters, nor the start and end instants in the manufacturing of each product. On the other hand, MoBiSea can achieve such goals with a maximum error of 2.96%, which we can consider negligible for the purpose of OEE calculation.
The remainder of this article is structured as follows: in Section 2, we describe the k-means, x-means and k-rms algorithms and the binary search and we discuss some studies related to our proposal. The MoBiSea algorithm is presented in Section 3, in which we clearly detail all its relevant and unique aspects. We also comment on the results after validating our proposal. MoBiSea is compared with k-means, k-rms and x-means in Section 4. Finally, Section 5 presents the most important conclusions and refers to future work.

4. Comparison with Other Clustering Algorithms

We analyzed six different production shifts to check the performance of MoBiSea with respect to the clustering mechanisms presented above in Section 2 by comparing the values MoBiSea obtains with those obtained via x-means, k-means and k-rms. We also compared them with real data to check the values were correct and precise. The methods we refer to, k-means and x-means, enjoy wide recognition and use within what is referred to as clustering. For this reason, we have chosen to present them as a base that is familiar to the reader. However, in addition to comparing ourselves against classic algorithms, we have also compared our proposal against more recent approaches such as k-rms. Our goal is to establish a recognizable environment, thus facilitating a more nuanced understanding of the unique attributes and advances that our approach incorporates, as well as making a comparison with improved and current algorithms.
The first two metrics analyzed are the number of clusters detected by each algorithm and the position of the centroids.
It is important to note that both k-means and k-rms cannot determine the number of clusters by themselves since that value is a parameter that the user must establish. Therefore, the number of real clusters has been used for comparison purposes. With MoBiSea, the centroids are given by the mode value of the sealing times in each of the clusters.
The third metric to consider is the number of signals included in each of the clusters by each of the algorithms. These data are essential, since they show how many units have been made of each product during the shift.
Finally, since the aim is to learn the exact moment when the shift change occurs (necessary for the correct operation of the DBF filtering mechanism), the exact product start and end signals determining each of the algorithms need to be determined.
Considering this, Table 4, Table 5 and Table 6 show the data obtained from the metrics previously mentioned (i.e., the number of clusters, the position of the centroids, the number of signals of each product, as well as the start and end values of each cluster). In addition, the real values are presented, so that the errors committed by each of the analyzed mechanisms (i.e., k-means, x-means, k-rms and MoBiSea) can be measured.
Table 4. Comparison between real-data metrics (Clusters/Center) and the results obtained via x-means, k-means, k-rms and MoBiSea.
Table 5. Comparison between real-data metrics (number of signals (error)) and the results obtained via x-means, k-means, k-rms and MoBiSea.
Table 6. Comparison between real-data metrics (Start/Finish) and the results obtained via x-means, k-means, k-rms and MoBiSea.
As can be observed, the number of clusters the x-means algorithm records differs considerably from the real-number clusters in four of the six analyzed shifts. Especially noteworthy are shifts 1, 2 and 5, in which x-means determines a number of clusters that differ significantly from the real number. Furthermore, it does not follow a pattern; in other words, x-means fails in returning the correct number of clusters, both over- and under-estimating this number. In all these cases, the number of clusters MoBiSea estimates fully matches the real values, except in shift six, in which MoBiSea only detects two clusters instead of three. This is due to the fact that, in that particular shift, only 30 units of one of the products were elaborated (an unusually low number). Note that, although the x-means detects three clusters, it fails when determining the number of signals of each product. In addition, regarding the centroids, we see how MoBiSea satisfactorily detects the sealing times of the different types of product that have been produced in all shifts, except in shift 6, where MoBiSea does not record the sealing time in cluster 2 due to the aforementioned reason.
Regarding the number of signals in each cluster, in other words, the number of sealing actions taking place for each of the manufactured product types, we find that, for x-means, errors mostly occur when a wrong number of clusters has been determined, as expected. However, it also correctly (or with a minimal error) estimates the number of signals in some clusters. With k-means, the error is very small in most of the shifts, although it is noticeable for shifts 2, 5 and especially 6. Specifically, the error made in shift 5 is 32.52% and 56.98% for the first two clusters; the error is even higher in shift 6, rising up to 1,583.33% for the second cluster.
In the case of k-rms, the most significant errors also occur in shifts 2, 5 and 6, with greater emphasis on the latter. In shift 2, cluster 3 presents an error of 46.94%, followed by cluster 1 with an error of 36.96%. In shift 5, the greatest error is in cluster 2 with an error of 30.85% and finally, cluster 2 in shift 6 presents an error of 2093.33%. These errors may be produced by the attempt to minimize the variance within the clusters and an outlier or noise can significantly increase the variance of a cluster.
Some of the values in Table 4, Table 5 and Table 6 are presented visually in Figure 14, Figure 15 and Figure 16. These figures help us observe the different clusters, their signals and the start and end moments of each of the various products, determined via the analyzed clustering methods.
Figure 14. Products estimated via the x-means, k-means, k-rms and MoBiSea approaches in shift 1.
Figure 15. Products estimated via the x-means, k-means, k-rms and MoBiSea approaches in shift 3.
Figure 16. Products estimated via x-means, k-means, k-rms and MoBiSea approaches in shift 5.
Figure 14 shows how x-means erroneously determines that the number of different products made in shift 1 is three, while MoBiSea correctly determines that only one product was elaborated. Note that k-means and k-rms need to be manually provided with the number of clusters.
Similarly, Figure 15 shows the difference that exists between the four mechanisms, x-means, k-means, k-rms and MoBiSea, in terms of both cluster detection and the number of signals. In this case, the x-means mechanism determines four clusters in round 3, while MoBiSea accurately shows that three different types of products were manufactured. Regarding the number of signals for each product, all mechanisms correctly grouped the 533 units of the first product made, although k-means and k-rms make a mistake in determining the number of signals for the second and third clusters.
Finally, Figure 16 shows how x-means determines a single cluster for shift 5, while MoBiSea indicates that three different types of products were manufactured. Additionally, both k-means and k-rms incorrectly group the signals forming part of each of the clusters, especially for the first two products, although it is worth noting that k-rms comes closer to the actual signal grouping of the clusters.

5. Conclusions

When applied in industrial settings, the Internet of Things (IoT) changes how companies operate by facilitating and improving all production processes. However, proprietary systems in this type of setting usually involve high financial costs and this slows down adoption and, therefore, the transition to Industry 4.0. Nevertheless, there are more economical alternatives that, instead, rely on low-cost devices. Although these devices have similar features, their cost is much lower, which can undoubtedly accelerate technological transition in industry.
In this article, we have presented MoBiSea, an algorithm capable of determining the number of different product types made in each shift and also of determining the moments when their manufacturing process starts and ends. MoBiSea was created as a solution for our low-cost system for measuring OEE, which required operators to manually enter the details of the products made. Thanks to our mechanism, the OEE calculation system can work automatically without needing any operator intervention.
We compared MoBiSea with the results obtained via k-means, k-rms and x-means clustering algorithms, as well as with real values, to discover whether it performs correctly and precisely. We found that MoBiSea is completely accurate, since the values obtained exactly match the real values for all the shifts we studied, except for shift 6. We consider it negligible given its singularity. However, the data obtained via x-means, k-means and k-rms show wrong results, even though k-means and k-rms were previously provided with the correct number of clusters.
Specifically, k-means and k-rms have precision errors when determining the sealing times of the different products (shown by the cluster centroids); they also fail to correctly determine the number of seals for each product and the moments when the manufacturing of each product begins (start and end times). The error is more noticeable with x-means, since we have found that it does not correctly determine the number of products made in each shift; therefore, it does not determine the other parameters our system needs (sealing times, number of seals and start and end times of the product).
As future lines of research, we first intend to analyze how MoBiSea performs in other industrial production areas, since we consider that this algorithm can be used to identify and cluster products in other production lines.
In addition, as we want to estimate OEE in an unsupervised manner and, at this point, we can automatically quantify two of the three variables needed to determine OEE (i.e., availability and performance), we still have to solve the problem of obtaining data on the quality variable, since these data are currently entered manually. The time lost due to defective products will have to be determined for that purpose.

Author Contributions

Writing—original draft, A.C.H., J.A.S., P.G., F.J.M. and C.T.C.; Writing—review and editing, A.C.H., J.A.S., P.G., F.J.M. and C.T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially supported by the Government of Aragón and the European Social Fund “Construyendo Europa desde Aragón” (T40_23D Research Group) and also by R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and “ERDF A way of making Europe”.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy concerns.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bajic, B.; Rikalovic, A.; Suzic, N.; Piuri, V. Industry 4.0 Implementation Challenges and Opportunities: A Managerial Perspective. IEEE Syst. J. 2021, 15, 546–559. [Google Scholar] [CrossRef]
  2. Horváth, D.; Szabó, R.Z. Driving forces and barriers of Industry 4.0: Do multinational and small and medium-sized companies have equal opportunities? Technol. Forecast. Soc. Chang. 2019, 146, 119–132. [Google Scholar]
  3. Nakajima, S. Introduction to TPM: Total Productive Maintenance; Productivity Press: New York, NY, USA, 1988. [Google Scholar]
  4. Herrero, A.C.; Martinez, F.J.; Garrido, P.; Sanguesa, J.A.; Calafate, C.T. An interference-resilient IIoT solution for measuring the effectiveness of industrial processes. In Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON), Singapore, 18–21 October 2020; pp. 2155–2160. [Google Scholar] [CrossRef]
  5. Herrero, A.C.; Sanguesa, J.A.; Martinez, F.J.; Garrido, P.; Calafate, C.T. Mitigating Electromagnetic Noise When Using Low-Cost Devices in Industry 4.0. IEEE Access 2021, 9, 63267–63282. [Google Scholar] [CrossRef]
  6. Kaushik, S. An Introduction to Clustering and Different Methods of Clustering; Analytics Vidhya: Amsterdam, The Netherlands, 2016; p. 3. [Google Scholar]
  7. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification and Scene Analysis; Wiley: New York, NY, USA, 1973; Volume 3. [Google Scholar]
  8. Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
  9. Ishioka, T. Extended K-means with an Efficient Estimation of the Number of Clusters. In Proceedings of the Intelligent Data Engineering and Automated Learning-IDEAL 2000. Data Mining, Financial Engineering and Intelligent Agents: Second International Conference Shatin, NT, Hong Kong, China, 13–15 December 2000; Springer: Berlin/Heidelberg, Germany, 2003; p. 17. [Google Scholar]
  10. Rahamathunnisa, U.; Nallakaruppan, M.; Anith, A.; Kumar, K.S.S. Vegetable Disease Detection Using K-Means Clustering And SVM. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 1308–1311. [Google Scholar] [CrossRef]
  11. Siswantoro, J.; Prabuwono, A.S.; Abdullah, A.; Idrus, B. Automatic image segmentation using Sobel operator and k-means clustering: A case study in volume measurement system for food products. In Proceedings of the International Conference on Science in Information Technology (ICSITech), Yogyakarta, Indonesia, 27–28 October 2015; pp. 13–18. [Google Scholar] [CrossRef]
  12. Hüseynli, A.; Yildiz, O.; Akcayol, M.A. Specification based automatic product categorization from unstructured data. In Proceedings of the 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2–5 May 2018; pp. 1–4. [Google Scholar] [CrossRef]
  13. Hochdörffer, J.; Laule, C.; Lanza, G. Product variety management using data-mining methods—Reducing planning complexity by applying clustering analysis on product portfolios. In Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 10–13 December 2017; pp. 593–597. [Google Scholar] [CrossRef]
  14. Noorbehbahani, F.; Mansoori, S. A New Semi-Supervised Method for Network Traffic Classification Based on X-Means Clustering and Label Propagation. In Proceedings of the 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 25–26 October 2018; pp. 120–125. [Google Scholar] [CrossRef]
  15. Garain, A.; Das, D. K-RMS Algorithm. Procedia Comput. Sci. 2020, 167, 113–120. [Google Scholar] [CrossRef]
  16. Imamura, K.; Kubo, N.; Hashimoto, H. Automatic moving object extraction using x-means clustering. In Proceedings of the 28th Picture Coding Symposium, Nagoya, Japan, 8–10 December 2010; pp. 246–249. [Google Scholar] [CrossRef]
  17. Knuth, D.E. The Art of Computer Programming, 2nd ed.; Addison-Wesley Longman Publishing Co.: Boston, MA, USA, 1998; Volume 3. [Google Scholar]
  18. Vuyyuru, G.M. KLP’s Search Algorithm—A New Approach to Reduce the Average Search Time in Binary Search. In Proceedings of the 4th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), Mysuru, India, 13–14 December 2019; pp. 185–190. [Google Scholar] [CrossRef]
  19. Jacob, A.E.; Ashodariya, N.; Dhongade, A. Hybrid search algorithm: Combined linear and binary-search algorithm. In Proceedings of the International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, India, 1–2 August 2017; pp. 1543–1547. [Google Scholar] [CrossRef]
  20. Bai, Y.; Yang, L.; Zhang, G.; Xu, Y. An improved binary search RFID anti-collision algorithm. In Proceedings of the 12th International Conference on Computer Science and Education (ICCSE), Houston, TX, USA, 22–25 August 2017; pp. 435–439. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.