Development of a Modality-Invariant Multi-Layer Perceptron to Predict Operational Events in Motor-Manual Willow Felling Operations

: Motor-manual operations are commonly implemented in the traditional and short rotation forestry. Deep knowledge of their performance is needed for various strategic, tactical and operational decisions that rely on large amounts of data. To overcome the limitations of traditional analytical methods, Artiﬁcial Intelligence (AI) has been lately used to deal with various types of signals and problems to be solved. However, the reliability of AI models depends largely on the quality of the signals and on the sensing modalities used. Multimodal sensing was found to be suitable in developing AI models able to learn time and location-related data dependencies. For many reasons, such as the uncertainty of preserving the sensing location and the inter- and intra-variability of operational conditions and work behavior, the approach is particularly useful for monitoring motor-manual operations. The main aim of this study was to check if the use of acceleration data sensed at two locations on a brush cutter could provide a robust AI model characterized by invariance to data sensing location. As such, a Multi-Layer Perceptron (MLP) with backpropagation was developed and used to learn and classify operational events from bimodally-collected acceleration data. The data needed for training and testing was collected in the central part of Romania. Data collection modalities were treated by fusion in the training dataset, then four single-modality testing datasets were used to check the performance of the model on a binary classiﬁcation problem. Fine tuning of the regularization parameters ( α term) has led to acceptable testing and generalization errors of the model measured as the binary cross-entropy (log loss). Irrespective of the hyperparameters’ tunning strategy, the classiﬁcation accuracy (CA) was found to be very high, in many cases approaching 100%. However, the best models were those characterized by α set at 0.0001 and 0.1, for which the CA in the test datasets ranged from 99.1% to 99.9% and from 99.5% to 99.9%, respectively. Hence, data fusion in the training set was found to be a good strategy to build a robust model, able to deal with data collected by single modalities. As such, the developed MLP model not only removes the problem of sensor placement in such applications, but also automatically classiﬁes the events in the time domain, enabling the integration of data collection, handling and analysis in a simple less resource-demanding workﬂow, and making it a feasible alternative to the traditional approach to the problem.


Introduction
Short-rotation willow crops (SRWC) are seen nowadays as a valuable alternative to produce renewable energy, contributing also to the rural development, job market diversification, carbon sink, biodiversity and diversification of agricultural crops and bioproducts. They are commonly established on agricultural lands and share many features with the traditional forestry, in particular the silvicultural practices [1]. Moreover, willow was found to be suitable for other engineering purposes, commonly exhibiting features such as a rapid growth rate, high biomass production, increased coppicing ability and tolerance to high planting densities [2].
When grown to produce biomass as a feedstock for the energy industry, one of the final SRWC production steps consists of harvesting operations. Several fully-mechanized operational systems were developed, tested and are currently in use for large-scale commercial willow harvesting purposes [3,4]. Still, the increased costs associated with harvesting operations [5] are seen as a limiting factor of profitability, which requires optimization [6]. While other operations which are common to SRWC cultivation can be done directly by their owners using equipment and machines of general agricultural purpose, it has been shown that, irrespective of the SRWC scale, the farmers cannot afford to own and operate expensive harvesting equipment [3,5]. In many cases, the lack or the limited availability of such equipment [7] has been tackled by the use of motor-manual means [8,9], which seem to be more adapted to harvesting operations carried out on small and dispersed plots [10], especially in those geographic regions in which the cost of the manual labor is still affordable, and may compensate for lower productivities [11].
Irrespective of the harvesting system used, its optimization requires at least data on production outputs and resources used (i.e., time, fuel, money) [12], and it is quite typical to implement time studies to evaluate its operational performance [13] as a common input for optimization. There are many examples, including those referenced in this paper, of using time-and-motion studies to evaluate the performance of operations. Unfortunately, most of the currently used methods to get time consumption data are resource intensive [14], requiring qualified personnel and specific logistics to collect, process, analyze and interpret it.
In relation to the use of motor-manual equipment to harvest SRWCs, some progress has been made by the use of dataloggers equipped with acceleration sensors to automate field data collection, which was then coupled with Global Positioning System's (GPS) data to infer the operational behavior in such operations [7,10]. In those studies, the analytical procedures of accounting and categorizing the time consumption on operational tasks were done by human intervention, requiring prior knowledge of the process mechanics and sensors' response and, more importantly, a hands-on approach to data processing, summarization and categorization. To eliminate much of that effort, Artificial Intelligence (AI) techniques have been lately used for a number of applications with excellent results in the recognition and classification of operational time, based on signals produced by various sensors, including accelerometers [15][16][17][18]. Nevertheless, the applicability of the developed models stays well inside their intended application, mainly due to the labelling outcomes, which are application-specific, and to the sensing modality which could produce contrasting responses in magnitude, calling for the location preservation when using unimodal sensing designs [17,18]. To this end, a sensing modality could be characterized by the type of physical parameter measured by a sensor and by the location of sensing it on a given study object. As such, multimodal sensing may involve the use of at least two sensors measuring the same physical parameter at different locations or the use of at least two sensors measuring different physical parameters at the same location.
Going back to the use of accelerometers to collect signals which may be useful for operational activity recognition in motor-manual felling of willow, unimodal sensing becomes important to save resources, and the use of a single sensor is desirable and affordable. However, there are many possible locations in which an accelerometer could be placed on a given tool, such as a brush cutter, making it likely to get signals characterized by a high variability in magnitude due to the sensing location. In addition, there is a high variability given by the operational conditions themselves, which may change even in the same harvested plot, and by the operational behavior of the workers. In such conditions, the models developed by training an AI algorithm need to have an acceptable generalization ability, in such a way that at least the location of signal collection would become irrelevant for a given classification algorithm. In our knowledge, a robust model able to reliably deal with unimodally sensed acceleration signals, i.e., acceleration sensed at a single location on the tool irrespective of the location, was not studied so far, while developing it from multimodally sensed data could help overcome the limitations stated above; if such a model would prove to be reliable irrespective of the sensing location, then it will contribute to resource saving while easing the data collection, handling and analysis workflow.
The goal of this study was to develop a modality-invariant operational prediction model with application in motor-manual felling of willow by brush cutters. The problem was approached by the means of training and testing a Multi-Layer Perceptron (MLP) with backpropagation on acceleration signals. Acknowledging that there could be many other approaches to the problem, the choice of the MLP as a technique to be used, as well as of the acceleration data as input signals, was mainly based on the author's experience with the MLP algorithms and the availability of acceleration signal data; in addition, the choice was guided by the findings of recent work, repeatedly showing excellent results in task recognition applications when using acceleration signals as inputs in the AI algorithms. By a modality-invariant model we are referring to a model able to acceptably generalize from any acceleration signal collected by the same sensor type, anywhere of an observed tool, while preserving the highest possible classification performance and the lowest possible generalization error.

Study Location and Crop Layout
The data used in this study were collected in the center of Romania, from three locations (Table 1), where motor-manual willow felling operations were observed in the Spring of 2017. The plots taken into study were located in an intra-mountainous depression at an altitude of ca. 600 m a.s.l. The climate in the area is characterized by a strong continentalism with warm summers and cold winters. All the plots (Poian 1, Poian 2 and Belani, Covasna County, Romania) were planted according to the European willow planting layout, in which the planting is done in twin rows distanced at 75 cm, and each twin row is distanced at 1.5 m from the next one; commonly, the distance between the cuttings used for planting is of 60 cm [19]. Note: a TRAIN means that the dataset was used in the training phase of the MLP, TEST means that the dataset was used in the testing phase of the MLP, E means that the data was collected by datalogger placement on the tool's engine, S means that the data was collected by the datalogger placement on the tool's transmission shaft, 1 means that the dataset was the first of the same modality class, 2 means that the dataset was the second of the same modality class; b size refers to the number of one-second sampled observations retained in the training and testing datasets.
Willow crops are becoming a common land use feature in the landscape of the study area, though they are typically established on small and dispersed plots whose previous use was agricultural [20]. In addition to the size and dispersion of the plots, the cultivation practice in the area is strongly influenced by the available technology for planting, cutback and harvesting operations, which are partly mechanized [7,9,10,[19][20][21]; many of them, such as harvesting, are relying to a great extent on motor-manual operations which are usually done by the use of brush cutters [7,9,10]. This situation is often leading to practicing rotations of 2-3 years, and typical for the area is that motor-manual willow felling operations are done in the early spring. Most of the operations related to willow cultivation in the area are done by employing local people on a daily basis.

Tool Description, Work Organization and Relevant Process Mechanics
The tool used for felling was a brush cutter made by Husqvarna (Model 545 RX, Husqvarna AB, Stockholm, Sweden), featuring an engine output of 2.1 kW at 9000 rpm and a transmission shaft that enables the connection and power transfer between the engine and the cutting device ( Figure 1); tools from this class are assumed to produce a noise level of 100 db(A) [22]. Harvesting work is typically done by motor-manual felling followed by manual bunching, transportation and chipping at a biomass terminal, or by bunching and chipping on site [23]. Brushcutters are commonly used in the study area to motor-manually fell the willow, being tools that can be adapted easily to a variety of jobs, simply by changing their active cutting devices [22,24]; when used for willow felling, they are usually equipped with steel saw blades (discs). Work organization, tool description and instrumentation of data collection. Legend: (a)-the typical work organization (1-bunch of stems to be felled, 2-bunches of felled stems, 3-wooden stick used to direct the felling, 4-throttling control, 5-engine, 6-transmission shaft, 7-steel blade, 8-triaxial accelerometer, 9-helmet equipped with a sound pressure level datalogger), (b)-the general layout of the felling operations (example from Poian 1 location, close to finishing the felling work in a plot), (c)-detail of the acceleration datalogger (8) placed on the shaft, (d)-details on the instrumentation used to collect the data (8-triaxial accelerometers, 9-sound pressure level datalogger).
The felling work is commonly done by two workers (Figure 1) of which one is the brush cutter operator who is in charge of the mechanical felling tasks and tool maintenance, and the other one assists the felling by a wooden stick [7,9,10,19].
While being rather simple, the organization of felling work is influenced by the capability limits of the used tools, layout of the crops and weather conditions [7,19], and it needs to be done with much attention and caution to ensure the safety of both workers. As such, felling direction is commonly adopted toward the exterior of the crop, felling work is progressing on a single row (one of the twins), and in such cases in which the length of the crop is very long, transversal corridors are often practiced to shorten the distances covered per turn, to be able to refuel and maintain the tool [19]. The assistant needs to place himself at a considerable distance behind the feller and he interacts with the stems to be felled only by the wooden stick, while the feller needs to be able to coordinate and control his motions on very short trajectories. The relevant work elements which may occur in such operations are the effective on-row felling, moving at the headlands or on a transversal corridor to approach a new willow row at the opposite side of the crop, maintenance and refueling, rest and meal breaks, as well as other kind of delays [7,9]. The distinctive feature of these operations is that, excepting the felling, the rest of work elements are typically characterized by engine non-use. Therefore, monitoring the engine working time makes it possible to accurately monitor the main work time consumption, which stands for a category of time in which the direct transformation of the work object takes place [13], and which is also useful and important to account for the fuel intake as specific to motor-manual operations [24,25].
Looking at a finer scale, however, the felling consists of worker's advancement on the row with various engine running regimes, combined with movements of the active cutting device of the tool toward outside and inside the crop to make the cuts, which are likely to produce variability in the responses given by the use of various sensors. Another relevant issue is the placement location of the dataloggers. For instance, accelerometer dataloggers could be placed at different locations on the engine block, as well as at different locations of the transmission shaft, therefore making it possible to receive different magnitudes of the acceleration during engine use; however, when the engine is switched off, the responses collected by accelerometers placed at different locations of the tool could be similar.
In the study area, felling work is always done by workers having an extensive experience in SRWC felling operations, gained on already more than a decade of SRWC management in Romania. Field data collection, which was done in 2017, was based on the informed consent of the observed workers and of the SRWCs' owner. They were informed about the intended use of the data and agreed to be observed when performing their jobs.

Instrumentation
Two datalogger types were used to collect the field data used in this study. Acceleration response, as the main data stream used, was measured and recorded using two Extech ® VB300 triaxial dataloggers (Extech Instruments, FLIR Commercial Systems Inc., Nashua, NH, USA). Irrespective of the study location, the acceleration dataloggers were set by the use of the dedicated software to collect data in the motion detection mode (threshold of 1 g), at a sampling rate of one second. One of them was placed on the engine's block and the other one was placed on the transmission shaft, at locations that were chosen carefully in such a way that they would not interfere with work safety. Both dataloggers were reinforced on the tool using highly-resistant plastic straps and were checked for holding before running the experiments and during the work breaks. An Extech ® 407,760 sound pressure level datalogger (Extech Instruments, FLIR Commercial Systems Inc., Nashua, NH, USA) was used to collect additional data needed for labelling purposes. It was set to continuously collect the sound pressure level on the dB(A) scale, at a sampling rate of one second, and it was placed on the helmet worn by the brush cutter operator. The main technical features of the used dataloggers are available on the producing company's website [26,27]. Figure 1 shows the approach used to equip the tool and the worker with the used dataloggers.

Datasets
Six acceleration datasets were collected in the three locations taken into study (Table 1) and the intention was to get for each of them a time overlapping sound pressure level dataset. However, due to a battery malfunction, sound pressure level data was lost in the case of Belani location. By the construction and setup of the acceleration dataloggers, the data is collected and stored as discrete triaxial (X, Y and Z) time-labelled responses; they are further summarized in the form of vector magnitudes, also known as the Euclidian Norm (EN), which is some sort of data fusion [28], and which is enabled by the dedicated software. The EN, which is named by the dedicated software under the generic term of "vector sum", may be written as in Equation (1) and it allows for a first normalization of the data, making it invariant to the axis movement. In fact, raw acceleration signals contain movement, gravity and noise components [29], while the instruments used to collect them respond well to vibration, a property which was used in this study.
where A i is a discrete value, in the form of Euclidian Norm (vector magnitude, vector sum), computed for a given sampling rate (adopted to one second in this study), and x i , y i and z i are the accelerometer's raw responses on the axis X, Y and Z, respectively, for the observation i. Sound pressure level data was collected and outputted in a similar way, being time and date labelled, and showing the sound pressure level measured in dB(A) at a sampling rate set at one second. In addition, both dataloggers can output data in computer-friendly formats such as the Microsoft Excel ® (Microsoft, Redmond, WA, USA). CSV files, and both of them provide data ID's and some summary statistics placed at the beginning of each file. Figures A1-A5 are showing the patterns of A i in the datasets used for training and testing of the MLP, emphasizing the amplitude and magnitude differences due to the location of the datalogger and engine working regimes.

Data Pairing, Segmentation and Labelling
To ease the effort of labelling the training data, as well as to compare the multimodal responses collected by the two acceleration dataloggers placed on the same tool, data pairing procedures were applied to the first two datasets (TRAIN_E and TRAIN_S) based on their time labels. This procedure was necessary to be able to label both datasets at once. Data pairing was done in Microsoft Excel ® , and it accounted for those observations which were present in both datasets and shared the same time label, an issue which was computationally approached, assessed and solved using logical functions. For example, if an observation from a given training dataset did not have had a corresponding time-labeled observation in the other training dataset, then it was deleted. This process was also run vice versa, until reaching a double set of observations sharing their time labels.
Labelling of the training datasets was done by considering the responses recorded by the acceleration and sound pressure level dataloggers (Figure 2), based on known experience on their responses in terms of magnitude. Two states were documented by labelling and segmentation, namely the engine running (labelled in the database by the string code ON) and the engine turned off (labelled in the database by the string code OFF).

Figure 2.
A sample from the acceleration and sound pressure level datasets used jointly to label the data. Note: for convenience, the sound pressure level data was downscaled by a factor of 100 to help in data comparison and labelling tasks. Legend: TRAIN_E stands for the training dataset collected by the placement of acceleration datalogger on the tool's engine, TRAIN_S stands for the training dataset collected by the placement of acceleration datalogger on the tool's transmission shaft, LABELLING stands for the sound pressure level data downscaled by a factor of 100; Events: 1-engine off and no movement of the worker (labelled as OFF), 2-engine on and felling (labelled as ON), 3-engine on and not felling (labelled as ON), 4-engine off and movement of the worker (labelled OFF), 5-data segments which were transient (inter-class variability) between the two engine states (labelled as ON).
For instance, sound pressure levels close to those described by the manufacturer for the operation of the tool (ca. 100 dB(A)) have pointed that the engine was on and throttled, therefore indicating that the worker was engaged in the effective felling operations. Drops in the magnitude of the sound pressure level (as shown in Figure 2 by the data labelled with 3), were considered to be the events in which the engine was on but no felling was done (idle running); these events were labelled as ON. Moreover, acceleration responses in the range of 1.1-3.0 g were compared to the data on sound pressure level, generally leading to their classification as engine OFF events. Transient events ( Figure 2, data labelled by 5) were included in the engine working category as well. However, due to the acceleration data collection mode (motion detection) and pairing procedures used, which have led to some missing data, the sound pressure level dataset was paired by doing some adaptations in the time domain such as removing some data or moving some data segments to pair them with the acceleration data. This was done for approximately 10% of the joint dataset, then the patterns generated by the magnitude of acceleration data were used for further labelling.
Based on the experience gained during the labelling and segmentation tasks done on the training dataset, the distributions of data in specific patterns were used as a condition to label the data in the rest of the datasets, which were used for MLP testing ( Figures A2-A5). Prior to the labelling and segmentation tasks, these datasets were preserved to their original number of observations, as they were outputted by the acceleration dataloggers. Therefore, the datasets shown in Figures A1-A5 contained the final number of one-second observations as described in Table 1, and each observation contained in them was the Euclidian Norm computed according to the Equation (1).

Fusion of the Training Datasets
All the datasets used in this study were subjected to early data fusion by the computation of Euclidian Norm. However, to simultaneously capture both the local dependencies over time and the spatial dependencies over modalities of collection, the approach was similar to that described in [30], and consisted of fusing the training datasets by a procedure referred as vertical stacking [28]. In particular, it was assumed that a more accurate data representation in the trained model, which could be achieved by the inclusion of spatial dependencies over the modalities of collection, could be important for the evaluation of datasets coming from other experiments using a single modality for field data collection, enhancing the trained model's recognition capacity. In addition, the procedure was assumed to improve the data representation in the trained model by actually doubling the size of the training dataset.
Procedurally, data fusion followed a simple procedure, by keeping the dataset collected on the engine as it was, and by merging the dataset collected on the transmission shaft at the end of the first dataset, resulting in the fused dataset ( Figure A1). Following data merging, the ID's of the observations were updated, and the resulting data vector was used as input for data normalization.

Data Normalization
Data normalization is commonly done by transforming the original data, and it aims at giving all the attributes an equal weight; in MLP applications with backpropagation it also helps in speeding up the learning process [31]. A min-max normalization procedure was used in this study, according to the Equation (2), which performs a linear transformation of the data, outputting values in a new range (0, 1), while preserving the relationships among the original data values [31]. Although there are many other procedures that may be used to scale the data, the choice of this normalization procedure was based on its simplicity and ease of use.
where An ij is the normalized value of observation i coming from the dataset j (An ij can takes values between 0 and 1, inclusively), A ij is the Euclidian Norm of the observation i coming from the dataset j, Amin j is the minimum value of the Euclidian Norm coming from the dataset j, Amax j is the maximum value of the Euclidian Norm coming from the dataset j. The use of Equation (2) required the computation of the minimum and maximum values of A i in each dataset j (j = 5), then it was applied to all observations from each dataset, using for this purpose the Microsoft Excel ® software. The transformed data was saved as new datasets, then it was used for training and testing purposes of the MLP model.  [32], which holds functionalities of implementing a multi-layer perceptron with backpropagation. All the training and testing tasks were run on a computer architecture that included the following features: system type-Alienware 17 R3, processor-Intel ® Core™ i7-6700 HQ CPU, 2.60 GHz, 2592 MHz, 4 cores, 8 Logical Processors, installed physical memory (RAM)-16 GB, operating system-Microsoft Windows 10 Home.
The size of the MLP was set in advance of the training and testing tasks to the highest values of depth and width enabled by the software used, based on the author's experience, practical recommendations formulated by [33], and recent results showing the effect of MLP's architecture on the classification performance for similar equipment [16]. Three hidden layers (depth) of 100 units each (width, as the number of neurons) were set for the MLP's architecture, and the number of iterations was set at 1,000,000. Training and scoring were done by cross-validation using a stratified approach and a number of folds set at 20. The recommendations of [33], as well as the information available in the recent literature, were used to choose the activation function and the optimization algorithm. One of the most popular activation functions in the rectified linear unit function (ReLu), which is supposed to provide high performances in solving complex, nonlinear problems [34,35], and it was chosen for this study. In simple words, an activation function takes the weighted inputs of a node (neuron), adds a bias and based on its result decides whether or not that node should be activated (fired); typically, ReLu makes such decisions when the results are positive. The optimization algorithm chosen for the MLP architecture was the stochastic gradient descent-based optimizer (Adam), which is one of the recently developed and used solvers due to its low training costs [36].

Tunning and Error Metric Used to Evaluate the Generalization Ability
A manual tuning approach was taken to check the training and testing performance of the MLP, and it aimed at altering the α parameter of the regularization term (L2 penalty regularization), by a trial-and-error approach. By doing so, the intention was to check what regularization strategy would reduce the generalization error [33] in combination with the architecture of the MLP and hyperparameters already set as described in Section 2.3.1. In MPL applications, the regularization term helps in avoiding overfitting by penalizing weights with large magnitudes; α is a parameter of the regularization term, whose increased values may fix high variance while decreased values may fix high bias [33,37]. Values of the α parameter were set successively at 0.0001, 0.001, 0.01, 0.1, 1 and 10, then MLPs were trained and tested over all four testing datasets, accounting each time for the training and generalization error. The error metric chosen for the evaluation of generalization ability was the binary cross-entropy (Equation (3)), which is commonly used in binary classification problems. A detailed worked example can be found at [38]. Its calculation is enabled by the used software and it works based on predicted probabilities assigned to the observations.
where H p (q) is the binary cross-entropy (log loss) function, N is the number of observations in a given dataset, l i is the label of a given observation i (i = 0, 1), and p(l i ) is the predicted probability of an observation being ON for all the observations (N). Note: the label ON received the value of 1 and the label OFF received the value of 0. For instance, if the label of an observation is ON, therefore l i = 1, then Equation (3) will add log (p(l i )) to the loss, which is the probability of that instance of being ON; if the label of an observation is OFF, therefore l i = 0, then it will add log (1−p(l i )) to the loss, which is the probability of that instance of being OFF. Training and testing results of the binary cross-entropy function were used in conjunction to choose the best model in terms of training and testing generalization capacity. Since training and testing was run on a number of 5 models (1 for training and 4 for testing), the values of binary cross-entropy were plotted against those of the tuned α parameter. Then, minimum and maximum values of each repetition done for each α value were computed, and the range found at the minimum value was used as a criterion to keep the best performing models.

Classification Performance Metrics
In addition to the log loss function, the software used for training and testing enables the computation of the training and testing time, area under curve (AUC), classification accuracy (CA), F1 metric, precision (PREC), recall (REC) and specificity (SPEC). The meaning and the possibility of use for these metrics is comprehensibly described in papers such as [39,40], therefore their complete definitions and formulae are not given herein. While all of these metrics were computed at the class (ON, OFF) and overall (dataset) level, in both, training and testing phase, the focus was on the classification accuracy (CA) and recall (REC) metrics; in binary classification problems, the first one stands for the number of correctly classified true positives and negatives of the total number of observations in a dataset, and the second one stands for the number of true positives classified as such of the total number of positives in a given dataset [39,40].

Evaluation
The best performing models in terms of error rate minimization and generalization ability were retained as final and selected for an additional evaluation. The additional evaluation consisted of a more detailed description of the misclassifications in the training and testing datasets as well as of developing plots to depict the predicted probabilities of the data. Misclassification issues were addressed by exporting the outputs of the training and testing phases into Microsoft Excel ® files, followed by the application of logical functions to extract the number of correctly classified datapoints (true positives-TP and true negatives-TN), false positives (FP) and false negatives (FN), based on a paired comparison of the ground truth against the predictions made on the training and testing datasets. This new data was summarized in the form of tables and plotted as graphs in the time domain, in the form of Euclidian Norm (Equation (1)) against misclassifications. Probability plots were developed by mapping the original data on Euclidian Norms (Equation (1)) against their predicted probability of falling in either the ON or OFF classes.

Description of the Labelled Datasets
The datasets used in this study accounted for a cumulated size of 107,276 s (ca. 30 h, Table 2) of which the fused dataset used for training (TRAIN) represented ca. 34%. Datasets used for testing accounted (in their order shown in Table 2), for ca. 25%, 21%, 11% and 9%, respectively. Excepting the dataset TEST_E2, data distribution on classes was found to preserve different degrees of class imbalance. Irrespective of the dataset, more than 57% of the data was labelled as ON, a class that accounted for ca. 90% of the TEST_S1 dataset's size. While from the perspective of developing robust MLPs, this is a common issue to be solved [28], from an operational point of view this kind of data distributions emulates very well the practice of motor-manual willow felling, where the effective felling itself dominates.

Model Selection and Classification Performance
Values returned by the binary cross-entropy error as a function of the regularization parameter's tunning are shown in Figure 3. Irrespective of the tunning strategy used, or the dataset in question, up to a value of α set at 1, the training and generalization errors were found to be less than 0.074 (7.4%, TEST_E2), showing, in general, a good generalization ability of the trained model. For values of α set from 0.0001 to 0.1, both the training (TRAIN) and generalization (TEST_E1, TEST_S1, TEST_E2, TEST_S2) errors were low, with the lowest ones found for α = 0.0001 and α = 0.1. Beyond this threshold (α = 0.1) the error started to noticeably increase at least for one of the testing datasets (Figure 3, TEST_E2). The lowest differences in terms of errors were found in the case of α = 0.0001 and α = 0.1 irrespective of the values compared (training and testing data or just testing data). For instance, when setting α at 0.0001, the value of the log loss in the case of training data was of 0.005 (0.5%) and it corresponded to a maximum value of 0.036 (3.6%) found in the TEST_E2 dataset. The figures were similar for α = 0.1, for which the error found for the training data was of 0.006 (0.6%), which corresponded to a maximum value of 0.037 (3.7%), found in the same testing dataset (TEST_E2). In term of errors, TRAIN and TEST_E1 datasets returned similar values for the range set for α between 0.0001 and 0.1. For the same range set for α, TEST_S1 and TEST_S2 datasets have returned a similar pattern in terms of errors.  (3), based on the normalized data (Equation (2)); models retained for further analysis are bordered by green dashed lines. Figure 4 is showing a comparison of the classification accuracy (CA) metric for the training and testing datasets, reflecting the effect that the value set for the α term had on this metric. In the training phase, all of the attempts to tune the regularization parameter term (α) returned very high classification accuracies. However, the classification accuracy of the training phase was preserved at the highest values (0.999, 99.9%) only in the range set for α between 0.0001 to 0.01, and it started to decrease as the regularization parameter approached the value set at 10. Moreover, the classification performance of the testing datasets was preserved to its highest values for α set in between 0.0001 and 0.01. However, the models selected for further assessment were those having this parameter set at 0.0001 and 0.1, based on the results returned by the log loss function, which are shown in Figure 3. Tables A1-A3 are showing the detailed classification performance metrics at the overall (dataset) level, as well as on classes (ON, OFF). Irrespective of the class, the minimum values of classification accuracy (CA) metric were of 0.944 (94%), indicating a high share of correct predictions for the worst prediction case. The minimum values of the F1 metric, which stands for the harmonic mean of precision (PREC) and recall (REC), were of 0.944 (94%), 0.948 (95%) and 0.938 (94%) for the overall, ON and OFF data. In the same order, the minimum values of classification precision (PREC) and recall (REC) were of 0.950 (95%), 0.988 (99%), 0.884 (88%) and of 0.903 (90%), 0.903 (90%) and 0.988 (99%), respectively, where precision stands for the fraction of true positives from the total of positives (TP and FP) and recall stands for the fraction of correctly classified true positives from the total positives. Accordingly, these metrics returned high values for the worst prediction cases, with evident differences as a result of the regularization parameter tunning. Training time of the MLP varied in between ca. 261 and 482 s, and it was of ca. 261 and 443 s for the models trained for α = 0.0001 and 0.1, respectively. A more detailed comparison of the classification accuracy for the former models is given in Table 3, showing some of the highest values of the CA among the set of regularization terms used. Excepting the TEST_S2 dataset, no significant differences were found in terms of classification accuracy as an effect of tuning the regularization parameter. In addition, classification performance was found to be very high in the case of most of the testing datasets, and in terms of classification accuracy (CA), its values ranged from 99.1% (TEST_S2, α = 0.0001) to 99.9% (TEST_E1), proving a high generalization ability of the trained models.

Missclassification and Probability Plots
The correctly classified observations in the training dataset (TRAIN, Table 4) were close in terms of relative frequency. In absolute numbers, however, the model using a regularization term set at 0.1 misclassified more (25 observations) compared to that of α set at 0.0001. When checked for the testing datasets (TEST_E1, TEST_S1, TEST_E2, Table 4), the number of misclassifications was relatively tied in relation to the regularization parameter term used, excepting the last testing dataset (TEST_S2, Table 4) which returned a better performance for α set at 0.1. Figure 5 is giving a representation of misclassified data points in the training and testing datasets for a regularization parameter term set at 0.0001. Irrespective of the dataset, the misclassified datapoints shared a common feature, namely their location in terms of magnitude in the transient data segments characterizing interclass variability. These segments were those mostly identified for operational events such as turning on or off the tool's engine, and which were formally included in the ON class. However, no attempts were taken to separate another class given the results obtained on classification performance and error metrics (Figures 3 and 4, Tables 3 and 4), which were considered to be acceptable. In addition, the number of observations which were found to be misclassified due to their belonging to these events is typically low in applications such as that studied herein (Table 4).   Figure 6 is showing a selection of predicted probability plots in a comparative approach. The data shown stands for the dataset used for training (TRAIN), as well as for datasets TEST_E2 and TEST_S1 used for testing. It compares the predicted probabilities of the datapoints from the abovementioned datasets of belonging to the classes ON and OFF, respectively, against the values of those datapoints computed according to the Equation (1).
For a value of the regularization parameter term set at 0.0001, the minimum values of the Euclidian Norm found to be predicted as ON were close to 3 g in all the datasets (detailed statistics are not shown herein, and Figure 6 shows only a selection of predicted probability plots). Accordingly, the maximum values of the Euclidian Norm found to be predicted as OFF were close to 3 g in most of the datasets. In comparison, for a value of the regularization parameter term set at 0.1, the minimum and maximum threshold values (as described above) of the Euclidian Norm were close to 4 g in most of the datasets. These statistics can be followed quite easy in Figure 6, where in the left panels (α = 0.0001) the predicted probability data is split for a probability set at 0.5 by a value close to 3. Accordingly, the left panels of the figure (α = 0. 1) split the predicted probability data, at the same probability threshold (0.5), by a value of the Euclidian Norm close to 4 g. For a value of the regularization parameter term set at 0.0001, the minimum values of the Euclidian Norm found to be predicted as ON were close to 3 g in all the datasets (detailed statistics are not shown herein, and Figure 6 shows only a selection of predicted probability plots). Accordingly, the maximum values of the Euclidian Norm found to be predicted as OFF were close to 3 g in most of the datasets. In comparison, for a value of

Discussion
Monitoring the operational performance is one of the common ways to get the data needed for sound decisions on running and improving the way that various businesses work. It is already a fact that many manufacturing industries are currently collecting sensor-based data to improve their operations and to respond by informed decisions to various production anomalies and problems [41], enabling them to be more competitive, responsive and resilient. In forest and SRWC operations, getting monitoring data was traditionally based on observing workers, tools and machines by time-and-motion studies [12][13][14], which have evolved from pen-and-paper to various sensing-based techniques; the latter are often implementing an external rather than a built-in sensor system e.g., [7,10,15,17,18,20,23,[42][43][44][45][46] mainly due to their purpose for collecting such data, which was often purely scientific. Although the modern machines may incorporate production monitoring systems that may work in real time, there are still few options to collect and handle such data for hand-operated tools. Recent studies have shown that the acceleration sensors may be successfully used to collect long term operational monitoring data e.g., [7,10,16,17,23,43,45,47] including by the use of platforms such as the smartphones [15]. In many cases, however, such data comes as modality-variant, unannotated sets, requiring significant resources to process and analyze it [7,10,47]. In this regard, the merit of this study is that it developed data collection invariant models able to automatically and accurately classify, analyze and interpret signals collected by triaxial accelerometers, enabling the possibility to extend their applicability to new coming datasets. As such, the implementation of MLP can serve to automatically classify new data recorded by triaxial accelerometers, irrespective of the datalogger placement on the tool.
One of the relevant issues for discussion is the sensing modality itself. Dealing with sensing modalities is not a new approach brought by this paper as it has been discussed [28] and used by other studies making use of sensors to measure various physical variables [15,17,30,47,48]. However, as there is no certainty that in follow-up field data collection activities the acceleration dataloggers will be placed at the same location each time, the developed models need to produce classifications that are invariant to such issues. By fusing the Euclidian Norm data collected on two of the most accessible parts of the tool, this study has facilitated the attempt of making the models invariant to the data sensing location. This is proven by the results obtained on the testing datasets, which returned in all the cases excellent classification results (Tables A1-A3), irrespective of the datalogger placement, operational variability or the individual handling of the tool. Moreover, the developed models were found to deal very well with the intra-class variability of the Euclidian Norm data (Figure 3, events labelled with 2 and 3 for a single sensing modality: engine (TRAIN_E) or transmission shaft (TRAIN_S)), which was mostly generated by the variation of operational behavior. As a fact, intra-class variability may be related to and generated by the same or more individuals performing differently something in a given activity [49]. In this study, intra-class variability was the effect of operational behavior in relation to the crop layout, some portions requiring walking with engine in the idle running, as well as the effect of other issues such as changes in the operational behavior for similar operational conditions. For a comparison, the reader may consult, for instance, Figures A2 and A4. However, a speculation that could be raised here is that the use of vibration data sensed by a direct contact with the tool has more potential in generating more clearly separable events; hence, it could stand for a good approach to eliminate much of the intra-class variability which could be generated by different persons carrying on the same task. Due to the vibration characteristics of the tools equipped with two-stroke engines, the models developed and tested in this study might work well also on data collected by sensor placement on the chainsaws to distinguish between engine working (ON) and non-working states (OFF). For instance, the work of [16] has shown a similar data pattern and vibration magnitudes for engine working states. However, further studies are needed to check if the models would work on tools from other classes that are characterized by contrasting constructive concepts.
Class imbalance [28,49] and inter-class similarity [49] are common issues causing classification problems in various applications of the human activity recognition. On the one hand, class imbalance biases the prediction of conventional models toward the classes holding the majority of data [28]. On the other hand, experiments that are purely observational hold few if no ways to address this challenge [12,14], as the occurrence of the datapoints in given classes is imposed by the operational conditions. Class imbalance was a defining feature of the datasets used in this study, which have shown a data majority attributed to the ON class (Table 2). Given the results of classification performance, however, it seems that this characteristic had small effects on the datasets if compared, for instance, to inter-class similarity (transient events such as turning on and of the engine), which resulted in some misclassifications ( Table 4, Figures 5 and 6).
Classification performance of the models was found to be very high, while keeping the error rates at a low level in both the training and testing datasets. For instance, classification accuracy was higher than 99% irrespective of the explored dataset, a value that is frequently termed as being very good [40]. However, there was a tradeoff between achieving high classification performances and keeping the generalization errors low, which in this study was evaluated by a trial-and-error approach which tuned the regularization parameter and lead to a selection of two final best-performing models. The final models (for α set at 0.0001 and 0.1, respectively), which were retained based on their lowest generalization errors, shared similar classification accuracies, excepting that of the TEST_S2 dataset, a case in which the model trained for α = 0.1 performed better. Given the similarity of classification accuracy for the rest of comparisons (Table 3), this outcome could be attributed to the functions and decision boundaries learned by the MLP model.
In addition to the hyperparameters' tuning, classification performance is affected by the architecture of the MLP (particularly the size) but also by the sensing modalities. Most often, the size of the MLP is selected based on rules of thumb [50,51]. However, the work of [16] has shown how an increasing depth (number of hidden layers) and width (number of neurons in a hidden layer) of the MLP may output increasingly accurate classification results for a case study run on triaxial acceleration data collected on a chainsaw. Based on that, as well as on the recommendations of [33], the size of the MLP was set to the maximum allowed by the used software. Sensing by two or more modalities may increase the classification performance. For instance, the work of [15] has found that the use of sound in addition to acceleration and gyroscope-collected data contributed to the performance increment of a Random Forest algorithm by decreasing the classification errors, while the work of [17] has found a better classification performance when fusing the data on acceleration and sound pressure level by horizontal staking before feeding it into MLPs, concluding that the preservation of sensing location may be of high importance in developing more accurate classification models. By comparison, this study removes the problem of sensing location by training the MLPs on dual-sensed signals collected at two locations on the tool. While further studies would be needed to check it, by its specific learning characteristics, the developed MLP could be invariant to the sampling rate of new coming data, because it makes its predictions based on the functions learned and not based on the sampling rate characteristics.
Collecting, processing, analyzing and interpreting large amounts of data is one of the approaches taken today to better understand ecological, social and technical systems, enabling a better decision making based on deeply informed grounds. There are several approaches, techniques and technologies to the problem which have already been described as being opportune for the general forestry [52]. Operational monitoring of SRWCs may benefit from sensor-based data collection approaches coupled with the techniques of artificial intelligence by removing the human error and the effort associated to the traditional observation [53,54]. Moreover, approaches such that described in this study could be used to prevent the safety issues associated with collecting data near dangerous machines and tools or in difficult outdoor conditions [12], being also less intrusive in applications that aim at observing people at work. At the time of purchasing the dataloggers used in this study, they were considered to be very small and useful for operational monitoring of motor-manual operations [47,48]. However, the technology of producing affordable miniaturized sensors in ongoing and had a significant progress, since smaller sensors are already released on the market, facilitating the transition to sensor-based operational monitoring. This progress has been reflected positively in various forestry applications requiring close-range sensing [55], and there is a lot of unexplored potential for such techniques both in forest and in WSRC operations.
There are two main limitations of this study. The first one is related to the MLP's misclassifications which were mainly found for those datapoints characterizing the socalled transient events of turning on and off the tool's engine (Table 4, Figure 5). The second one is related to the fact that in this study the data was segmented only in two classes, therefore the engine idle time was included in the same class as the effective willow felling time. Both of these problems may be easily solved by adding more context by the use of GPS units, an approach that has shown good results in previous studies on the topic [7,10]. In addition, in the phase of data interpretation, one can treat misclassifications as non-felling time having in mind the knowledge gained by this study. However, further studies are needed to evaluate whether adding primary and derived GPS location data into the MLP would help in designing applications able to look more deeply into the underlying process of willow felling operations. For instance, GPS coordinates and speed were used to infer the location and operational behavior of a feller by a traditional human-assisted classification approach [7,10] and they could provide additional context for the design of a multiclass MLP.

Conclusions
For a binary classification problem which emulates the most important operational events in SRWCs' motor-manual felling by brush cutters, the developed MLPs have returned high classification accuracies (99.1% to 99.9%) which were invariant to the sensing modality judged by the sensors' location. While the study has addressed hyperparameter tunning by the modification of the regularization parameter term, two final models were retained as being able to (i) provide high classification accuracies, (ii) generalize well on the testing datasets collected by single modalities and (iii) retain low errors in both, the training and testing phases. Given the obtained results, the developed models are assumed to be invariant to the new coming data, making them useful in classification applications and enabling the automation of most of the workflow typically implemented to collect, process, analyze and interpret large amounts of data. Further studies could bring new interesting and valuable insights if focused on evaluating the classification performance by the possibility of adding more context to the developed MLPs. This could be achieved by fusing the triaxial acceleration data with that collected by miniaturized GPS units to be able to classify and describe in more depth the operational tasks.  Institutional Review Board Statement: Ethical review and approval were waived for this study, due to the fact that the workers observed in the study agreed to participate based on an informed consent.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data supporting the study is available on request to the author.

Acknowledgments:
The author acknowledges the technical support of the Department of Forest Engineering, Forest Management Planning and Terrestrial Measurements, Faculty of Silviculture and Forest Engineering, Transilvania University of Brasov in designing and conducting this study. The author would like to thank to Eng. Arpad Domokos and to his employees for making this study possible, as well as to Eng. Nicolae Talagai and Eng. Marius Cheţa for their help in field data collection activities and for providing the raw data for this study.