A Series Production Data Set for Five-Axis CNC Milling

: The described data set contains features from the machine control of a ﬁve-axis milling machine. The features were recorded during thirteen series productions. Each series production includes a changeover process in which the machine was set up for the production of a different product. In addition to the timestamps and the twenty recorded features derived from Numerical Control (NC) variables, the data set also contains labels for the different production phases. For this purpose, up to 23 phases were assigned, which are based on a generalized milling process. The data set consists of thirteen .csv ﬁles, each representing a series production. The data set was recorded in a production company in the contract manufacturing sector for components with real series orders in ongoing industrial production.


Summary
The data set presented offers researchers insight into the changes in machine parameters on a milling machine during different phases of the production process.The data set contains the recorded features of the Numerical control (NC), and it also includes an assignment of the underlying production phases, which was added by manual labeling.
Compared to other domains where dedicated data repositories are available, engineering data sets can generally be found in generalist data repositories [1].Table 1 shows the result of a search for entries of milling-related data sets in the largest generalist repositories.In relation to the huge amount of existing data sets in the repositories, milling-related data sets can rarely be found.After closer analysis of the search results, the number of unique data records with a manufacturing reference appears to be few as well (see the last column in Table 1).Therefore, publishing the presented data set aims to make more manufacturing data available for research.
The recorded data set contains production data from the NC of a five-axis milling machine, which was recorded in thirteen sessions between the end of November 2021 and the beginning of April 2022.For each recorded session, the anonymous data from the NC of the machine covers the preparation for the next manufacturing order (also defined as changeover) and the subsequently produced parts.
The used milling machine, HERMLE C600 U, was equipped with an NC HEIDEN-HAIN iTNC 530, and was operated in regular production shifts at the company Pabst Komponentenfertigung GmbH in Schweinfurt, Germany.Pabst specializes in the design and manufacture of tools and special machines, as well as the individual and series production of machine components.The machined parts were bearing components from the field of aerospace.The Pabst company was a member of the publicly funded research project Optimization of Processes and Machine Tools through Provision, Analysis and Target/Actual Comparison of Production Data (OBerA), which was established to support metalworking companies with a focus on Small and medium-sized enterprises (SMEs) from northern Bavaria in digital techniques to optimize their production.The project lasted from 1 April 2018 to 31 December 2021.Several sessions of the data set were also recorded after the official end of the project.Overall, 13 sessions were recorded, forming the complete data set (see Table 2).The changeover process of a machine is characterized by manual activities that leave their traces in the data of the machine control system.However, the production phase that follows changeover is characterized by machine movements that take place largely without human interaction.As the dataset contains data for both changeover and production, it is suitable for creating models of human-machine interaction during changeover, but also for the specific analysis of the production phase.As the dataset was recorded in series production, specific effects in series production can also be analyzed with the dataset.
Selected sessions from the data set were used in two publications of the authors so far.
In [2], the data set presented here was used to train a Machine Learning (ML), model for detecting changeover periods in production data.These models were then compared to ML models, which were trained with a data set from a DMG 100 U duoBLOCK milling machine.The DMG machine was equipped only with external sensors, which were not connected to its NC.The ML approaches for both machines were compared and discussed.Data from the DMG machine is not part of the data set presented here, only data from the HERMLE machine is included.Sessions 1 and 2 were used for this research.
In [3], the influence of different noise types on the training data of ML, models was evaluated.For this purpose, sessions from the data set were overlaid with simulated noise.These data set was then used to train ML models for detecting the changeover periods in manufacturing data.During the simulation, different noise types were selected, and their specific influence on the metrics of the ML model was evaluated.For this research, sessions 1 and 2 were used.
The available dataset is processed, and missing values were corrected to make the dataset easy to use.

Data Description
Table 2 shows the names of the .csvfiles for the 13 recorded sessions, the number of data rows, and the data rows per hour.It can be seen that the number of rows of all sessions ranges between 7526 and 19,810, and the data rows per hour have a comparable magnitude with an average of 1914.15 data rows per hour.Usually, the sessions were recorded during one working day.Only session No. 13 was recorded on the 5th, 7th, and 8th of April.On the 6th of April, the machine was in maintenance.Each file uses commas as the separator, has a column descriptor in the first line of the file, and contains data from one changeover sequence followed by the subsequent production sequence.
Table 3 shows the durations of the specific sessions in the column "Total time" and the duration of the changeover period in the column "Changeover time".During a changeover, a machine is prepared for a new product type.The columns "Old product" and "New product" show an anonymized product number, which stands for the product that was produced before and after the changeover period.In the data set, the first column contains the timestamp in the format YYYY-MM-DD HH:MM:SS.The following columns contain the recorded data from the individual features from the NC.Table 4 shows all 20 recorded features with a short description.The numerical control registers the signals from internal sensors, like spindle speed or status of a door lock, and exports them as features.The features contain information about the milling process, i.e., FeedRate, as well as status information from the milling process, i.e., ProgramStatus and status information from the machine, i.e., DriveStatus.No additional external sensors are part of the data set.
Table 5 shows that some .csvfiles contain 19 features, and some contain 20 features.In cases of files with 19 features, variable No. 5, "PocketTable", was not recorded due to malfunctions in the recording interface.
In the last three columns of the data set, the authors assigned a production phase label for each timestamp.Each column represents a specific approach to label the specific production phase:

•
In the two-phase approach, only two phases are labeled if the machine is in a changeover state or intermittent idle time (No. 1) or the machine is in production state (No. 2).The column heading in the .csvfiles is "Production".The labeling was performed for sessions 1, 2, and 13 by a researcher supervising the changeover and production process in real-time in situ.For the labeling of sessions 3 to 12, data from the worker terminal was used (see also the explanation below).Labeling the specific phases to the timestamps from sessions 3 to 12 was performed using reported changeover start and stop times from the worker terminal.The timestamps from the worker terminal have a resolution of 5 min.This also implicates a rounding error of max.2.5 min for the assigning of a specific production phase.For sessions 3 to 12, only labeling according to the two-phase approach was conducted.
For sessions 1, 2, and 13, the labeling was performed by a researcher who supervised the complete recording period in person.The timestamps have a resolution of 1 s.Deviations due to the human reaction time can be expected and are estimated to be 0.3 s.For these sessions, all three labeling approaches were conducted.
Figure 1 shows the different counts for the two-phase labeling approach in all thirteen sessions.Due to the varying order lot sizes, there are different numbers of data rows for the changeover and production phase for the thirteen sessions.
Figure 2 shows the different counts for the six-phase labeling approach for sessions 1, 2, and 13.The number of data rows for the six phases is comparable for sessions 1 and 13.Session 2 contains many data rows for the production phase "5".  Figure 3 shows the different counts for the 23-phase labeling approach for sessions 1, 2, and 13.Session 2 contains many rows with the label of phase 21 (production).Session 13 shows more idle time (phase 20) than production (phase 21).  Figure 1 shows strong variability for the phase counts of the changeover and production class.
The variability in the counts of the changeover classes result from the different efforts required for a changeover to a new product.It should be noted here that, in theory, setting up product A after product B can result in different efforts than setting up product B after product A. This difference can arise, for example, from increased handling effort when preparing fixtures and machining tools for smaller or larger dimensions.Table 3 shows for each session from which product to which product the machine was set up.The variability in the counts of the production classes result from different manufacturing order lot sizes, which are related to specific customer orders.
In Figure 2, session No. 1 and No. 13 show similar counts over all six phases.From Figure 1, it can be seen that these sessions have the same proportionality between the changeover and production class.In contrast to session No. 1 and No. 13, session No. 2 contains much more production and less changeover counts, which is expressed by the much higher count in its phase 5.
In Figure 3, the behavior for the variability of the counts from Figure 2 is more detailed with more subphases, but shows comparable characteristics.

Methods
The HEIDENHAIN numerical control iTNC 530 of the HERMLE C600 U machine is a legacy machine that does not support communication standards like OPC Unified Architecture (OPC UA).Therefore, for the data acquisition, a middleware by the company Cybus collected the NC data via the HEIDENHAIN DNC interface and the Cybus Agent transported it via the MQTT protocol to the Azure cloud and into an SQL database (Figure 4).The use of middleware resulted in a preselection of around 400 available variables.Of these variables, domain experts selected 19 variables which, based on their description, indicated a context for the milling process.Variable No. 20 "Warmup" was derived after the data acquisition from variable No. 3 "ProgramDetail" and added to the data set [2].

Table 1 .
Milling-related data sets in data generalist repositories.

Table 2 .
Recorded sessions in the data set.

Table 3 .
Duration of complete sessions, duration of changeover period and related products.
[2]v files is "Phase".Table 6 contains a short description of all 23 phases.For more details, please see[2].Basic statistics of the features and labels are listed in Table A1 in Appendix A.

Table 5 .
Number of features in the recorded sessions.
* only first number used for labeling.** not used in this research.

Table A1 .
Statistics for data set.