Supervised Distributed Multi-instance and Unsupervised Single-instance Autoencoder Machine Learning for Damage Diagnostics with Highdimensional Data - A Hybrid Approach and Comparison Study

. Structural health monitoring (SHM) is a promising technique for in-service inspection of technical structures in a broad ﬁ eld of applications in order to reduce maintenance efforts as well as the overall structural weight. SHM is basically an inverse problem deriving physical properties like damages or material inhomogeneity (target features) from sensor data. Often models de ﬁ ning the relationship between predictable features and sensors are required but not available. The main objective of this work is the investigation of model-free Distributed Machine Learning (DML) for damage diagnostics under resource and failure constraints by us-ing multi-instance ensemble and model fusion strategies and featuring improved scaling and stability compared with centralised single-instance approaches. The diagnostic system delivers two features: A binary damage classi ﬁ cation (damaged or non-damaged) and an estimation of the spatial damage position in case of a damaged structure. The proposed damage diagnostics architecture should be able to be used in low-resource sensor networks with soft real-time capabilities. Two different machine learning methodologies and architectures are evaluated and compared posing low-and high-resolution sensor processing for low-and high-resolution damage diagnostics, i.e., a dedicated supervised trained low-resource and an unsupervised trained high-resource deep learning approach, respectively. In both architectures state-based recurrent Arti ﬁ - cial Neural Networks are used that process spatially and time-resolved sensor data from experimental ultrasonic guided wave measurements of a hybrid material (carbon ﬁ bre laminate) plate with pseudo defects. Finally, both architectures can be fused to a hybrid architecture with improved damage detection accuracy and reliability. An extensive evaluation of the damage prediction by both systems shows high reliability and accuracy of damage detection and localisation, even by the distributed multi-instance architecture with a resolution in the order of the sensor distance.


Introduction and Related Work
Structural Health Monitoring (SHM) based on Lamb waves, a type of ultrasonic guided waves, is a promising technique for in-service inspection of aircraft structures.The implementation of SHM systems into aircraft applications reduces maintenance efforts as well as overall structural weight.Lamb waves are excited and received using a network of actuators and sensors, which are permanently attached on the structure.Lamb waves are very sensitive and exhibit different wave interaction mechanism with structural damages, such as attenuation, reflection, scattering or mode conversion.By analysing the sensor signals different kinds of structural damages can be detected and located [1], [2].
Automatic and reliable damage diagnostics using SHM systems is still a challenge, especially in the case of carbon fibre laminate due to their anisotropic material characteristics.Depending on the underlying measuring technique used to retrieve suitable sensor signals that show a sufficient correlation with damage or fatigue features, the recognition of the damage features requires complex analysis with experts knowledge and intervention [3].Moreover, damage diagnostics can be an inherently distributed problem [4] using spatially distributed sensors [5] still processed by a central instance leading to scaling and efficiency issues.Scaling is limited with such centralised architectures.But distributed data processing in sensor networks, especially addressing material-applied or material-integrated sensor networks, impose strict resources constraints of the signal processors both regarding memory and computational power of each unit.
Damage and structural health diagnostics is an inverse problem.A model M represents a measurement that maps a spatial and time-dependent environmental context p e (x, t) with a feature set f (e.g.damage class and location) of a device under test (DUT) on sensor signal data s.The damage diagnostics system requires the inverse model M -1 that maps the sensor data on M ( x, t, p e , f ) : ( x, p, f ) → s M −1 ( s, p m ) : s → f the requested features to be monitored (related to another measuring parameter set p m ): (1) Beside numerical methods (e.g.inverse numeric [6]), Machine Learning (ML) can be utilised to derive the inverse model M -1 from training example data mapping s on f.Due to the highly non-linear model function Artificial Neural Networks (ANN) are often used to implement a hypothesis of the required damage predictor function [7].
The task of Structural Health Monitoring systems is to detect and locate different kind of damages from sensor data which are produced by permanent applied sensors on the structure.Related work can be classified in model-based [8] and model-free methods.Damage diagnostics with homogeneous and isotropic materials, e.g., aluminium or steel, can be handled with established methods.But dealing with materials posing complex physical relationship between damages and sensor signals, e.g., by anisotropic and non-linear interaction behaviour, like in composite laminates, deriving suitable models that map sensor data on state information is a still challenge or still not possible.
The main objective of this work is the investigation of model-free Distributed Machine Learning (DML) under resource and failure constraints (including sensor noise, drift, and fa-Computers 2021 www.mdpi.com/journal/computerstigue) by using spatial model decomposition and global model fusion strategies.Distributed learning and feature inference gains attraction in recent years to overcome scaling and reliability issues, originally applied in wireless structural health monitoring networks [5].Additionally, state-based ML should process time-resolved sensor data, already successfully applied in the field of SHM with guided waves [9].The damage diagnostic system is operating in space and time dimension.
Commonly, low-resource and low-resolution approaches used in damage monitoring that can be deployed on embedded computers typical for sensor networks pose only limited and operational constrained damage diagnosis capabilities (with respect to classification and localisation of damages), whereas high-resolution approaches require high computational time and storage requirements, often utilising Deep Learning and Computer Vision (CV) [10].Deep learning of Artificial Neural Networks (ANN) are often used for SHM [11].Base-line approaches either try to derive damage feature by analysing differences between a non-damaged base-line experiment and a device under test or by using more advanced approaches together with ML.Autoencoders (AE) are candidates to detect anomalies in sensor signals related to damage features that can be derived with deep Learning, basically by modelling multiple levels of visual abstraction (from low-level features to higher-order representations, i.e., features of features) from the sensor data [11].in a first step an AE approach encodes a signal.A second step reconstructs (decodes) the signal again.If the AE is trained with ground truth data only it will not be able to reconstruct a signal containing differences due to anomalies (the feature to be detected), e.g., a damage that modifies a sensor signal.Comparing the reconstructed signals with the original signal enables damage detection without a supervised training with labelled data.
The proposed ML architecture should be suitable for processing on low-resource embedded computers like material-integrated sensor nodes of a sensor network [12].Ideally, a node of the sensor network processes only local sensor data and performs local damage diagnostics.The damage predictor functions have to be highly discriminative with respect to noise and varying operational and measuring conditions [13].There are two levels of prediction, i.e. feature extraction from the sensor signals: 1.The classification of damage and non-damage cases; and 2. The prediction of the spatial properties.In principle, there is a third level that classifies the damage class and its cause.This level is not addressed in this work.
Typically, the predictor functions are derived by supervised learning and sensor data derived from experiments with a defined and fixed parameter set of specimen geometry, actuator configuration and measurement setup, i.e. excitation frequencies, frequency filters, sensor scan grid as well as damage type and position [13].Operational variance (e.g.temperature and humidity) have to be considered, too (contained in the parameter sets p e and p m ).One major challenge is training of the predictor functions with limited variance of training data, concerning the variance of experiments of a single set-up to cover typical measuring and specimen variations (i.e.repetitions of experiments under same conditions with the same parameter set for the device under test, damage, and the measuring set-up) and the variance of experiments and the respective features (i.e.different damage cases, classes, positions, sensors, environmental conditions).This limited training data results commonly in a lack of required generalisation of the prediction model that cannot be transferred to a broader range of parameter sets and unknown specimen configurations.
Two different approaches are compared in this work, which are finally fused to a hybrid system: A multi-instance low-resolution and a single-instance high-resolution architecture differing in resource requirements and the training class (supervised versa unsupervised learning, respec-

Computers 2021
www.mdpi.com/journal/computerstively).Common to both approaches is the deployment of state-based recurrent ANN (RNN) processing time-resolved sensor signal data from a spatially bounded context (i.e., local sensor data processing).The low-resolution approach should be capable to be used in structural applied (material-integrated) sensor networks (i.e., SHM at run-time), whereas the high-resolution approach can be primarily used for laboratory diagnosis or at service-time.On one hand the high-resolution approach delivers an assessment base for the low-resolution approach, on the other side the low-resolution approach can be used as a fast approximating region-of-interest feature marker for the high-resolution system.
Given recent advances in sensor technologies and micro-system integration this article proposes that a robust, yet simpler, real-time capable and low-resource, distributed machine learning approach is now available for accurately estimating damages in hybrid materials compared to conventional sensor analysis and deep learning approaches.SHM based on Lamb waves and ultrasonic measuring techniques enables the detection of different kinds of structural damages and their localisation [1], [2].However, the presence of at least two Lamb wave modes (symmetric modes, S 0 , S 1 , S 2 , ..., and anti-symmetric modes, A 0 , A 1 , A 2 ...) at any given frequency, their dispersive characteristic and their interference at structural discontinuities produce complex wave propagation fields.Due to the complex wave fields, conventional algorithms are reaching their limits for robust damage detection and localisation with the application.
In order to develop new damage detection algorithms based on machine learning the experimental air-coupled ultrasonic technique is used.With this technique the Lamb wave propagation field can be measured at any position of the structure.The measured wave propagation at a given position is used as sensor data for damage detection.The machine learning approaches require high amount of experimental data sets with different damage types and locations.Therefore, different removable pseudo defects are developed which can be applied to different locations of the structure and generates comparable wave interactions such as real structural damages.
In the next sections the basic requirements for signal data processing and ML are presented, including a description of the origin of the sensor data and the physics of wave propagation relevant to understand damage detection.Furthermore, the present paper contains two main sections, one for each ML architecture and training approach.Finally, both approaches are compared and fused to a hybrid architecture (although, more as an outlook).

Feature selection and extraction
Feature selection and extraction is the process to derive meaningful information related to a target variable y from sensor data related to the observation variable x.Therefore, any feature selection can be represented by a generalised function Ω(s): s → f.There are input data features and target variable features to be distinguished.
The process of input feature selection is typically related to sensor data pre-processing that transforms and reduces the raw sensor data s to relevant information s f =x contained in the signal s with respect to target variable y (defining the input vector x), e.g., using time-frequency transformation to get selected frequencies from the signal, the variance of the signal or other signal features.In this work there are spatial and temporal relevant features that have to be selected to perform the final feature extraction that delivers damage feature vector F= < D D,P P > (categorical damage classification D D and estimation of the spatial position of the damage P P), related to the target variable y.
The signal feature selection is performed in this work primarily by a wavelet analysis of the time-resolved sensor signals, discussed in Sec.5.2.
The target output feature extraction (damage classification and estimation of the spatial position of the damage) is then performed by the model function M -1 introduced at the beginning and derived by machine learning (using the pre-processed input data features).

Taxonomy of Architectures
It is assumed that there is a sensor network SN represented by a graph G= < S S,C C > that con- sists of numbered (i, j) and identifiable sensor nodes S(p) i,j ∈ S S each at a different spatial position p=(x, y) providing at least one time-resolved sensor signal s(t).The sensor nodes can communicate with each other via a network structure (vertices of the SN graph) with connections com ij,kl in C C.
There are basically two main strategies of sensor data aggregation and sensor network architectures for machine learning that can be deployed to derive a feature vector F of the device under test (DUT) from the sensor data matrix D, i.e., the target variable y related to the global state ST of the DUT:

Global Learning
Inference by and training of a single predictive model instance M using a spatially collected data record series D(t) sampled at a certain time t (or averaged over a time interval) that is processed by a central processing instance (one processing node).The single instance directly delivers the global feature vector F related to the global state ST.

Local Learning with Global Fusion
Both architectures are compared in Fig. 1.Global fusion of a predictor function estimating the global state from an ensemble of local prediction or classification models related to a local state can be performed by probalistic methods, negotiation, majority election, and in the simplest case by spatial averaging.The fusion strategies used in this work are discussed in the sections 4 and 5.There is no sensor interaction in terms of communication.But the distributed sensor signals are correlated by the wave propagation.A single sensor node processes only its local sensor and passes the pre-processed data to its local learning instance only predicting the local state (e.g., a damage nearby), discussed in the next sections.

Generalization
With respect to supervised learning, training data T= < D,Y > is used for learning (sensor data with target variable association).Testing a trained model is performed by statistical error analysis of:

Sensor Processing
The sensor data processing and data flow consists of the following processing stages, shown in Fig. 2, finally delivering the damage prediction results: 1. Recording of the experimental sensor data (laboratory) and uploading the raw data to a file server; 2. Decoding of the raw data and storing the raw measurement data in a SQL database (numeric format with hierarchical record tables); Central part is an advanced SQL data base server.The SQL data base stores all experimental and computed data including ML models.The SQL data base can be accessed by a SQLJSON Remote Procedure Call interface.The SQLJSON-RPC provides request-reply communication (e.g., SQL queries) via a JSON code and data format.SQLJSON-RPC supports micro-code execution for complex operations send by the requesting client and executed by the server.The SQLJSON-RPC APi is an overlay software layer on top of a generic SQLITE data base API.

Computers 2021 www.mdpi.com/journal/computers
Finally, the SQLJSON service provides a virtual file system layer that maps files and directories containing data and meta data files on tables, which can be requested like any other SQL table.
To support typical data set in numerical matrix format, hierarchical data set tables were added (organising data sets similar to HDF5 structure with data types, data spaces, meta data, and automatic matrix type conversion).There is a dedicated SQLDS API with support for packed arrays on top of SQLJSON to support access of data sets (and mapping on generic SQL tables).
! !" Fig. 2. Sensor processing and data flow in the SHM system in this work.Central part is the computation of virtual sensors arranged in a spatially two-dimensional sensor network.The dotted arrows show the relation of physical sensors and their sensor positions with virtual sensor nodes (a set of physical sensors are mapped on a virtual sensor).

Computational complexity and resources
Dealing with large volumes of data is a challenge with respect to the spatial and temporal dimension.Even in case of single guided wave measurements there is a significant volume due to the time-resolved recording of the sensor signals.The spatial dimension of sensor data determines primarily storage requirements, whereas the temporal dimension determines the computational time.Computing time and storage requirements differ significantly in the different stages and phases of data processing and predictor model function training and inference: • Phase I. Acquisition and processing of sensor data: Computational time is dominated by communication time, storage is mainly related to the original sensor data size and the communication network; Computers 2021 www.mdpi.com/journal/computers • Phase II.Preprocessing of the data (Feature selection): Computational time is medium and closely related to the feature selection and transformation algorithms; storage depends still on original data size; • Phase III.Generating of predictive models and training (with partial testing of Model quality): Computational time and storage depends significantly on the used model implementation and its structure (function, directed graph/tree, neuronal networks), and computational time depends additionally on the training algorithms and the processing of the training data instances (single vs. batch vs. monolithic instance processing); • Phase IV.Test of the trained models: Computational time depends on the model size/structure, its functional complexity, and on the number of data instances, but there is no significant increase of storage; • Phase V. Inference/application to unknown data (incorporates Phases I/II, too): Same as Phase IV.
Parellelisation of distributed multi-instance learning (MM or MS class) at process -or node level (basically on control path) is possible: • All local learning instances are independent posing high computational effort that can be parallelised; • Synchronisation and merging of local data is required only by global model fusion that is a simple task with low computational effort; • Parallelisation can be applied on one central computer as well as in the distributed sensor network; • Speed-up S ≤ 15 with a central computer (2 CPU Sym.NUMA, 8 cores / CPU, L3 Cache ≥ 15MB); • Speed-up S < N with N distributed sensor nodes (sensor data locally).
Parallelisation of single-instance learning is partially possible if there is a globally trained and generalised model that can be applied locally bounded regions of data or single sensors in the best case (SM class), i.e., a predictor functions that marks and amplifies local features from the sensor signal (posing high computational effort).Global fusion by object and pattern recognition (e.g., point density computations) or by negotiation and consent algorithms can be classified in low and mid computational classes and can be commonly neglected.
The resource and computational time requirements of both diagnosis architectures are evaluated in the respective sections.

Overview
An experimental data base is used to evaluate both approaches.The experiments were performed to get raw time-resolved sensor signal data D ∈ R R 3 featuring the following physical Computers 2021 www.mdpi.com/journal/computerskey facts: • Specimen under test: • Material: CFRP (Carbon Fibre Reinforced Plastic); • Shape: Plate; • Phy.dimensions: 500 × 500 mm; • Measurement method: Air-coupled contactless ultrasonic 2D scan with a grid spacing of 2mm; • Sensor: different air-coupled ultrasonic probe; • Measuring wave excitation by a surface bonded piezoelectric actuator; • Excitation: Rectangular burst signal with 3 pulses at frequencies from 40 to 200 kHz; • Defect type: Pseudo defects consisting of a round steel plate (diameter 20mm) which is attached to the plate and provides realistic wave interactions compared to real defects; • Defect variance: Pseudo defect boned with methyl methacrylate adhesive (MMA) or stick with vacuum sealant tape (VST) to the surface of the plate; • Defect position variance: Experimental variations with 9/11/14 different defect positions on the plate (regularly and irregularly spaced); • Spatial measuring resolution (raw data): 250 × 250 measuring points (2mm spacing); • Temporal measuring resolution: about 4000 sample points / record (amplitude).
The data was delivered in a proprietary data format and was decoded and stored in a SQL data base with hierarchical data set tables providing a close binding of measuring data and meta data (experimental parameter sets), described in Sec.2.4.

Experimental Data sets of Lamb Wave Propagation Fields Air-coupled Ultrasonic Technique
In order to produce a high amount of experimental data sets which are required as input for the machine learning approaches the air-coupled ultrasonic technique is used.This technique is a well-established, contactless method for the measurement of Lamb wave propagation fields.
To excite Lamb waves a piezoceramic actuator is applied on a plate structure.The actuator is made in form of the so-called piezocomposite technology (DuraAct, PI Ceramics GmbH) and consists of a round piezoceramic (diameter: 10 mm, thickness: 0.2 mm) to excite homogeneous, almost circular wave propagation fields.The actuator is bonded on the plate structure by a twocomponent epoxy adhesive in a vacuum process (Henkel AG, Loctite Hysol 9455).The transducer is driven with a rectangular burst signal with 3 pulses.The plate structure is a quasiisotropic CFRP (Carbon Fiber Reinforced Plastic) laminate with 7 plies.The layup as well as the mechanical material properties are described in [14].The plate dimensions are 500 x 500 x 2 mm.The plate is installed on spikes inside of a frame with mechanical stops, to reduce the Computers 2021 www.mdpi.com/journal/computerswave interactions with the mechanical mounting and to avoid deviation in positioning during assembly and disassembly.On the sensor side different ultrasonic sensors, which measure the out-of-plane displacement of the Lamb wave field, are used.In order to investigate the wave interaction of different Lamb wave modes (A 0 and S 0 mode) with high amplitudes (signal-tonoise ratio) the wave propagation field is measured at various frequencies (40, 80, 120, 200 kHz).At lower frequencies of 40 to 80 kHz the A 0 mode exhibit high amplitudes whereas the S 0 mode shows high amplitudes at higher frequencies of 120 and 200 kHz.The further analogue signal processing, data conversion and scanner controls are provided by the ultrasonic system USPC 4000 AirTech (Hillger NDT GmbH).In addition, the ultrasonic system controls a portal scanner that moves the scanning sensor.The scanning sensor moves in form of a meander course over the plate and measure the wave field in a 2 mm grid.This leads to a spatial measuring resolution of 250 × 250 measuring points over the whole plate.The following Fig. 3 shows the experimental set-up for the measurement of Lamb wave fields as well as the position of the actuator and the pseudo damages.The output of the measurements are data files which consist of an amplitude over time signal for each measuring point within the 2D scanning grid.These so-called 3D volume data sets build the input for the following damage detection approaches.In a first step of the experimental data recording the wave propagation fields are measured at the different frequencies without any pseudo damage applied to the plate.These measurements are used as a reference for the damage detection.

Removeable Pseudo Damages
The machine learning approaches require a high amount of experimental data sets with different damage types and locations.To reduce the amount of manufactured CFRP plates removable pseudo damages are developed.The pseudo damages can be applied on different locations of the damage-free plate and cause comparable wave interactions such as real damages (e.g.de-

Computers 2021
www.mdpi.com/journal/computerslaminations).After each measurement set the pseudo damages can be removed without any residues.It was defined to use round pseudo damages with a diameter of 20 mm which is smaller than the required size of a damages (25 mm) to be detected by nowadays SHM systems [15].To produce realistic wave interactions, like absorption, reflection, scattering or mode conversion, two different types of pseudo damages are developed.The first type of pseudo damage consists of a round steel plate (thickness: 10 mm) and is applied with hand pressure to the CFRP plate using vacuum sealant tape (GS-213, Airtech Europe SARL) which will be referred in the following as VST.This pseudo damage can be removed by simply peeling it off.Due to the absorption characteristics of the relative thick sealant tape (thickness: 3 mm) this pseudo damage only absorbs the wave energy and reduces amplitudes on the sensor side.
The second type of pseudo damage consists of the same round steel plate and is bonded to the plate using methyl methacrylate adhesive (MMA).The advantage of this adhesive is that it cures within a short period of time (typ.20 minutes) which reduces the time span of each measurement set.Furthermore, it produces a relative thin bonding layer due to its low viscosity and exhibit high young's modulus which results in a relative rigid bonding layer and high stiffness change in the plate structure.The stiffness change causes reflections, scattering and mode conversions within the Lamb wave propagation field.This pseudo damage can be removed by a small lateral knock with a hammer.Due to the fact that the CFRP plate has a smooth surface, no residues of the adhesive are remaining on the plate.The following Fig. 4 shows exemplary the Lamb wave interaction at 80 kHz with the pseudo damage applied with vacuum sealant tape.In general, the pseudo damage applied with vacuum sealant tape exhibit an attenuation of the A 0 mode at 40 to 120 kHz and a phase shift at 40 to 80 kHz.Wave interaction of the S-mode with this pseudo damages are not observed in the investigated frequency range.The following Fig. 5 shows exemplary the Lamb wave interaction at 80 kHz with the pseudo damage applied with methyl methacrylate adhesive.This pseudo damage produces mode conversion from S 0 into A 0 mode at frequencies of 40 to 200 kHz.The A 0 mode exhibit reflections and scattering at 40 to 80 kHz.Wave interaction of the A 0 mode at higher cannot be observed because the S 0 mode dominates the wave propagation field with its higher amplitudes.Furthermore, local phase shifts and attenuation (behind the pseudo damage) of the S 0 mode from 120 to 200 kHz and of the A 0 mode from 40 to 80 kHz can be detected.It can be summarised, that the pseudo damage with methyl methacrylate adhesive produces more wave interactions compared to the pseudo damage with vacuum sealant tape.Therefore, this pseudo damage can be better detected by the damage detection algorithms.
The various observed wave interactions with the pseudo damages show up in the sensor signals in different ways.The mode conversion and reflection/scattering produce new wave packets which appear in the sensor signals at specific time of flights.The time of flight of the new wave packets depends on the distance between pseudo damage and sensor.Therefore, the new wave packets can be interfered by the original excited wave packets (S 0 and A 0 mode) if their time of flights matches.Or the new wave packets appear clearly in the sensor signal if their time of flights differs from the original excited wave packets.The other wave interaction, such as phase shift and attenuation, influence only the original excited wave packets in form of phase shifts and amplitude reductions.In summary, the feature selection of the machine learning algorithms should be able to identify the different wave interactions by selecting specific time frames within the sensor signals.Within the experimental data sets the two types of pseudo damages are applied one after the other at 21 different positions, as shown in Fig.

Signal Features and Damage-Wave Interaction
As outlined in the previous section, damages or more general material inhomogeneity have an influence on the wave propagation with respect to: • Amplitude modification, i.e., damping and inference; • Reflection; • Frequency and mode conversion.
Therefore, relevant features of the measured temporally and spatially resolved sensor signals are related to amplitude, phase, and frequency properties.But the time-resolved sensor signal at a specific measuring position will consist of different segments.Typically, only the first segments contain damage-relevant features, whereas the later segments There are basically three different approaches for extracting relevant signal features (beside statistical properties): • Time-frequency transformation (Fourier transform) of the entire signal record; • Time-shifted window frequency transformation; • Wavelet transform and decomposition of the signal record [16] [17].
The frequency transformations are bound to the time-frequency uncertainty principle, and a windowing approach increases the time resolution but decreases the frequency resolution.The wavelet transform (among other wave decomposition methods not discussed here) can be considered as method preserving time and frequency properties of the input signal [16].
The relevant damage features are contained in the sensor signal.Bit temporal position and extend of the region of interest containing the relevant features depends on the wave propagation, the wave interaction, and the relative positions of the sensor, damage, and actuator (i.e., forming a spatial graph), shown in Fig. 6 (b).The sensor signal is basically divided into three segments of initially unknown length: The pre-, feature, and post-signal segments, illustrated in Fig. 6 (a).Even the comparison with a base-line (damage-free) signal does not expose the relevant features without pre-and post-processing.

Multi-instance Learning with Multi-instance Prediction (MTMP class)
In this section the first low-resolution and low-resource approach using multi-instance learning of a damage predictor function is introduced.In a first attempt, the raw sensor data is processed by a virtual sensor network on a generic computer.The results can be mapped directly on a real sensor network.

Concept
The damage diagnostics processing the raw sensor data uses the following key methods: • Supervised multi-instance learning by a virtual sensor network (8 × 8 nodes) processing local time-resolved sensor data derived from the experimental measuring data; • The output of the local supervised learning is a predictor function that have to detect a damage in the near region around a sensor (continuous output in the range [0,1] with binary threshold classification); • The predictor function is implemented by a state-based recurrent ANN with LSTM cells by using a JavaScript Ml framework integrating an improved Neataptic ANN [18], the network configuration is [4,6,1], i.e., 4 input neurons, 6 LSTM cells, and one output neuron; • The global fusion of all local damage predictor function outputs approximates the spatial damage position (if any) within the boundary of the sensor network supporting position interpolation.
The principle experimental and data analysis set-up is shown in Fig. 7.All computations were performed in JavaScript either by a WEB browser (SpiderMonkey VM) or by using node.js(V8 VM).The feature selection process and the basic ANN architecture is shown in Fig. 8. Typically, the levels 3-5 contain relevant signal features.Each level of the DWT consists of a low-and highpass filter providing the approximation and details of the signal, respectively, providing a good time-frequency analysis [17].The approximation is the input signal of the following next level, the detail signal is the input for the ANN.Each level of the DWT reduces the sampling frequency by two (down sampling), i.e., at the output of the DWT filter a sampling expander (up sampling) is required for each level to equalise the sequence length of each input signal.
The recurrent state-based ANN structure consists of n input neurons with sigmoid transfer function (one for each DWT decomposition level used), a hidden layer of Long-short term memory cells (LSTM), and one output neuron with an output range of [0,1] (also sigmoid transfer function).A value nearby 1 represents the detection of a damage in the surrounding region around a sensor.

Computers 2021 www.mdpi.com/journal/computers
There are many different LSTM cell architecture around.We are using the LSTM cell implementation from the Neataptic ML framework [18], shown on the right side of Fig. 8. Central part is a state cell (C t ) surrounded by different gates (input i t and output o t gates) controlling the forward and feedback paths of the cell and the memory history (by the forget gate f t ).Depending on the particular configuration, the LSTM cells of one layer can be interconnected (memory-to-memory connections).
The DWT for the i-th level can be generally defined by the detail and approximation func- tions D and A related to the high-and low-pass filters, respectively: with N data points of the original time series x(i), i=0,1,.., N-1, j=0,1,.., J-1, k=0,1,..,2 J -1, J=log(N).The function ψ and φ are related to the mother wavelet function and its mirror function, respectively.Details regarding the DWT can be found in [17].

Target Variable Computation for Labelling
The individual sensor nodes should detect damage/defects within a local area.There is a simplified assumption that damage detection is possible in a circular area around a sensor, i.e., isotropic sensitivity (not true; rather elliptically shaped in direction of the axis actuatordamage-sensor!).An Euclidean distance damage-sensor is used as an indicator of damage / non-damage classification, i.e., a specification for the expected prediction value of the ML with p as the sensor position.The target variable estimation is only required for the first supervised learning approach.The second unsupervised approach due not rely on labelling for training.This algorithm applies a threshold filter to all local prediction results with a binary decision mapping (damage activation).All activated discrete node positions are added to a point cloud.Finally, an unweighted centre of mass (COM) computation is applied to this point cloud inter- where SN is the full set of sensor nodes of the network, s o the output of the prediction function of the node at position s x , s y in the range [0,1], t is the threshold for binary classification.

Weighted Centre of Mass.
This algorithm applies a threshold filter to all local prediction results with a binary decision mapping (damage activation).All activated discrete node positions are added to a point cloud together with a weight derived from the predictor function output.Finally, a weighted centre of Prior a weighted centre of mass computation, density-based clustering using the DBSCAN algorithm [20] is applied to the point cloud consisting of node positions with a predictor function output above a given threshold.The largest clustered group is select for COM.This approach is proposed to be useful to discriminate clusters of true-positive predictions from clusters of false-positive predictions, as evaluated and discussed in the following results section.The DBSCAN algorithm uses a global density parameter.An advanced approach can be used with a local density parameter uses for clustering [21].

Distributed Center of Mass (Cellular Automata).
The previous algorithms collect all node prediction results and perform the damage localisation on a dedicated centralised node.To avoid any centralised instances for scalability and robustness reasons a distributed COM algorithm is processed by the network nodes, i.e., an al-

Computers 2021
www.mdpi.com/journal/computersgorithm based on a cellular automata model with neighbourhood communication only.The algorithm bases on the fully weighted COM approach.
The basic concept of a distributed weighted COM (DCOM) is the propagation of partials sums in rows and columns of the network, assuming more or less logical regular grid communication architecture.The logical position of a node with respect to the sensor network have to satisfy an ordering constraint, i.e., an East neighbour is physically located on the right, a West neighbour is located on the left side, and so on.
The first upper left node of the CA network initiate the propagation of the partial sum calculation from left to right (horizontal axis) and downwards (only initiators of further row propagations).Each node at the end of the row propagates the row accumulation downward.The last lower right node finally computes the approximated centre position of the damage.Each cell has a state, defined in Def. 1.Only the first node must be marked (always position (1,1)).All other nodes derive their position from the neighbouring nodes, i.e., a node has not to know its absolute position in the network, only the relative neighbouring connectivity.
Assuming a regular mesh sensor network with N × M nodes the DCOM approach requires NM steps to compute the weighted damage position.

Def. 1. Data structure of CA cell
The cell activity is shown in the algorithm Alg. 1, and the principle right-down shift propagation of the weighted COM is shown in Fig. 10.There is a dedicated initiator and collector node.Note that any edge node can be initiator or collector by rotation of the node matrix.All four configurations can be processed overlapped increasing redundancy in a technical network with node or communication failures.Fig. 10. operations of data in the distributed CA-based COM.The blue node is the initiator, and the red node is the collector.Any edge node can be initiator or collector by rotation of the node matrix by 90 degree.Alg. 1. CA cell COM accumulation algorithm

Training
The training of the RNN is rather simple.All model instances associated with virtual sensor nodes are trained independently (although, on a central computer sequentially).The feature transformed input signals activate each RNN sequentially.The first four DWT decomposition levels are used.After the RNN is activated, the prediction error is computed.It is just the difference of the target variable (binary damage label) at the last output value of the RNN (linearised in the interval [0,1], 1: damage, 0: no damage)).The desired target variable value (0/1) is passed to a gradient descent back propagation algorithm adapting the weights of the network and the parameterisation of the LSTM cells (primarily internal edge weights and gating parameters).
The basic training algorithm for one node is shown in Alg. 2. The training is applied to all nodes with a randomly sequential selection of training instances.After a spatially averaged mean error is below a threshold value, selected nodes with falsepositive and/or false-negative predictions are trained, shown in Alg. 3. The false-positive rate for the non-damage case must be zero, the local false-positive or false-negative rates in damage experiments should be minimised.sample := random.select(trainingData);5: ∀ node ∈ nodes do 6: {error,error 0 ,error 1 ,state} := train(node,sample)cup 7: errorT 0 := 0.9*errorT 0 +0.1*error 0 ; 8: errorT 1 := 0.9*errorT 1 +0.In Fig. 11, some results of the distributed sensor network activation and damage prediction are shown for the training set consisting of 9 damage positions (MMA) and one base-line experiment.The following table Tab. 1 and the bar plot in Fig. 12 shows the prediction accuracy of the trained LSTM model using DWT features of the time-resolved sensor signal.The positions errors of the weighted centre point calculation of a predicted damage (pseudo defect) is in mm and must be evaluated with respect to the overall DUT plate dimension of 500 × 500 mm and the sensor node spacing of 60 mm.The prediction accuracy is averaged over all data sets.The first data set was used for the ANN training and for the test evaluation.The mean position accuracy is about 60 mm averaged over all experiments and data sets, i.e., in the order of the sensor node spacing distance (60 mm).The mean position accuracy is about 20 mm for training data experiments only, i.e., 1/3 of the spatial sensor node spacing distance.In Fig. 12 five different global fusion algorithms are compared (see Sec. 4.4 for details).In most damage cases the fully weighted COM approach shows the best average accuracy results.Some damage cases show still good average position accuracy but with larger variance and in few cases with a large maximal error (e.g., D375-250), i.e., extending the error boundary, another important statistical feature of a SHM system.This shows the dependence of the damage position estimation from the spatial sensor-actuator-damage triangle and their positions relative to each other and relative to the edges and sides of the plate.At the edges there are significant wave distortion effects, like edge reflections, with a significant impact on the damage prediction.Fortunately, due to the spatial specialisation of the trained predictor functions of the sensor nodes near the edges and sides of the plate they are able to discriminate these wave distortions sufficiently.
In Fig. 13 some typical network activation patterns with local false-positive activation clusters are shown.The density-based clustering approach can lower the average of the damage localisation error, but increases the maximal error boundary.This increase of the maximal error is a result of (1) Wrong cluster discrimination (selecting the cluster with the highest number of points), and (2) the false-positive prediction compensate a position estimation by a geometrical distorted (non-separable) true-positive cluster.The binary unweighted COM approach using a threshold discrimination produces only in some cases lower localisation errors.The fully weighted COM approach shows mostly the best results.The distributed approach with a CA model shows comparable results and is fully suitable to approximate the damage position.

Single-instance Learning of an Auto-encoder with Multi-instance Prediction (STMP class)
In this section, the second high-resolution approach using unsupervised generalised singleinstance learning of a signal auto-encoder is introduced.The output of the trained auto-encoder is used to predict the damage position (pseudo defect) by using weighted point density (WPD) analysis.

Concept
In contrast of the concept of the MTMP approach directly predicting damage features, the second STMP approach consists of two stages: 1.A anomaly feature marking by a RNN detecting difference of the sensor data to a nondamage base-line experiment; 2. A damage feature extraction using the output from 1.
A predictor function is trained using data only from a damage-free baseline experiment.Any non-conformity to the base-line data (features) is detected by the predictor function with a "damage" classification.The challenge is to derive a generalised predictor function (independence from spatial location of sensor, actuator, and damage) which discriminates damages from other signal non-conformity, i.e., noise, variance in the measuring configuration, reflection of waves at edges, and many more non-damage related artifacts.It can be assumed that there are commonly sufficient training data sets with varying damage-free sample instances, i.e., with a variance in operational and measuring conditions.
One unsupervised method to detect differences to a base-line signal is using an auto-encoder and decoder to code and reconstruct (decode) the sensor signal.If the auto-encoder function is trained only with damage-free sensor signals it is not able to reconstruct a signal resulting from wave interactions nearby a damage.Comparing the reconstructed signal with the actually measured signal gives a binary damage classificator by applying a threshold function to the mean average error of the reconstructed signal and the originally measured signal.The basic signal processing architecture is shown in Fig. 14.
The auto-encoder and decoder was implemented with a state-based recurrent ANN and LSTM cells using the Tensorflow ML framework [22].Computation (training and inference) was primarily performed on a GPU.The network configuration is [64,32,32,64,2], i.e., an input layer of 64 LSTM cells, a hidden layer of 32 LSTM cells (encoder) and a hidden layer of 64 and 32 LSTM cells implementing the decoder, and two output neurons.

Network architecture
The network is based on LSTM-cells arranged in an encoder-decoder setting.Both the encoder and the decoder consist of 2 layers of LSTM-cells with a decreasing / increasing amount of units respectively.This arrangement serves as a bottleneck where only the most essential information from the input features are kept.The compressed information is then used to decode it back to its original form.The network therefore is an LSTM-based auto-encoder.This also means that prior labelling becomes unnecessary as it is an unsupervised learning technique.By training the network with global data of undamaged CFK-plates, it learns to accurately compress and decompress its undamaged input data on any local position individually.However, supplying the network with signal data that includes damage information, e.g.wave reflections, results in a much greater error, because the network intentionally never saw damage information during training.The reconstruction error of the network is therefore an indication of a possible defect.

Post processing
The mean averaged error derived from the decoder output is then classified into damage or no damage features using a simple threshold function.Because of the globally trained network this procedure can then be repeated for sensors with different locations on the CFK-plate, which, applied iteratively, results in a binary image of spatially resolved damage / no damage Computers 2021 www.mdpi.com/journal/computersfeature classifications of any resolution.This image can then be used as the input for a weighted point density analysis using DBSCAN to estimate the damage location.
Typical examples of the post-processed images are shown in Fig. 16.The dependency of the position accuracy with respect to the sensor-actuator-damage configuration is shown in Fig. 17.Damages near by the actuator (nearly in the plate centre) cannot be detected accurately.There are feature activations near by the edges and corners of the plate due to wave reflections, interferences and mode conversions (conversion of one Lamb wave mode into an other mode).These artifacts disturb the damage prediction and localisation.Moreover, in this work a specimen structure consisting of only one composite material is considered.Hybrid structures in terms of combined section regions of different materials, e.g., with intermediate stringers, will pose similar artifacts.

Results
Results of the base-line approach using the auto-encoder output and density clustering feature extraction are shown in Fig. 18.The accuracy measures are derived from all three data sets by Monte Carlo simulation adding gaussian noise to the originally measures data sets.Some samples show false-negative predictions (indicated by black bars in the plot, typically 10-20% of the samples of one experiment).The training was only performed with the sensor data from the defect-less experiment.In contrast to the first local multi-instance learning, the global singleinstance learning shows some false-negative predictions, i.e., no defect (position) was detected in case of an existing defect, indicated by black bars.The maximal prediction error occurs by a defect placed in the centre of the plate near by the actuator.The mean position error averaged over all sets and neglecting the three high error cases (D125-250, D250.Single points are outliners.

Comparison of both Methods
The damage diagnostics of both approaches presented in this work consists of the binary damage classification (i.e., there is a damage in a specimen or not) and the spatial damage localisation.The distributed multi-instance approach with global centre-of-mass fusion poses a high reliability with a true-positive and true-negative rate of 100%.The AE-based singleinstance approach is affected by feature selection artifacts that result in a false-positive rate of about 20% in some damage-cases (depending on the geometrical triangle sensor-actuatordamage with respect to the specimen boundaries).The averaged position accuracy is about 60mm in the multi-instance and 20mm in the single-instance approach, summarised in Tab. 2.
The computational performance of both architectures is compared in Tab. 2. the computational performance is relevant for the deployment in distributed and embedded sensor networks as well as for real-time capability.The real-time capability is defined by the overall measuring time (e.g., 1s) and the deadline for a result (that can be ranging from seconds to minutes).
The distributed multi-instance approach scales nearly linearly with the number of sensor nodes, and hence only one node is considered here.The single-instance approach was processed primarily on a GPU system.Even on an embedded computer like the RaspBerry 3 with an ARM Cortex CPU the inference time is below 100 ms for one measurement and is suitable for Computers 2021 www.mdpi.com/journal/computersreal-time analysis.The multi-instance approach shows a comparable computational time on an embedded computer and by using a VM compared with the native code GPU-based algorithms per sensor.But the single-instance approach requires a higher sensor density for damage recognition and position estimation (at least 50 × 50 sensors).

Hybrid Architecture
The unsupervised trained auto-encoder-based spatially generalised method poses on one hand a high accuracy, on the other hand an increased false-negative rates and in some damage cases a low accuracy.The supervised trained distributed multi-instance approach shows lower but reasonable accuracy with a zero false-negative and false-positive rate.Both methods can be fused to a hybrid architecture with improved performance: • The MTMP instance approach is used for a first approximation of the damage location (region-of-interest marking, ROI) and a proper damage/no-damage classification • The STMP instance approach uses the ROI and damage classification from the MTMP to discriminate inaccurate and wrong damage predictions (selective inference and feature extraction)

Conclusion
Two different ML architectures were introduced that predict damages of a carbon fibre laminate plate with a high accuracy and reliability.Both approaches deliver a binary damage classification and an estimation of the damage location relative to the plate boundaries.The first is a low-, the second a high-resolution method with respect to sensor density and accuracy.
The first approach is a distributed multi-instance architecture with supervised training and suitable for the deployment in sensor networks.The sensor density is sparse (here 8 × 8 sensors).Each trained model instance is capable of predicting a damage in the neighbouring region around the sensor node.Global fusion finally approximates the spatial position of the damage achieving an average accuracy in the order of the sensor node distance (60 mm).The distributed Computers 2021 www.mdpi.com/journal/computersapproach showed 100% true-positive and 0% false-positive/negative damage classifications in all test data instances.The spatial graph of sensor, actuator, and damage and its position relative to the plate boundaries has an impact on the location prediction accuracy.The multi-instance models are bound to their spatial region where they are trained, thus they pose no spatial generalisation.
The second approach is a spatially generalised single-instance architecture with unsupervised training based on a base-line anomaly prediction using an auto-encoder.The single model instance can be replicated supporting multi-instance prediction.The sensor density is high (here 250 × 250 sensors).This approach showed an improved averaged accuracy in the order of tenth times of sensor distance (20 mm).This approach is not suitable for processing on embedded nodes of a sensor network due to high computational time and resource requirements (e.g. one GPU) and is considered as a laboratory diagnostics system and a reference analysis method.The main advantage of this approach is the unsupervised training method compared to the supervised first approach, avoiding labelling difficulties and a higher degree of generalisation (with respect to spatial, temporal, and environmental parameters).
Common to both architectures is a state-based recurrent ANN using a Long-short term memory cell processing feature transformed time-series data.Discrete wavelet decomposition is used as the primary feature transformation (the distributed multi-instance approach uses the first to fourth level, the auto-encoder approach uses the third and fourth decomposition level).The high-resolution approach delivered about 5% false-negative and 0% false-positive predictions.The false-negative rate can be dropped to zero by fusing and coupling both architectures.The binary damage classification is taken from the first system, the high-resolution position estimation by the second or by the first if the second system cannot find a damage.
There are still a lot of questions and evaluations to be done: • Measurement and processing of more reference data with a broader range of different damage locations, mounting technologies, and environmental variations; • Considering experiments with more than one damage (training and inference); • Enhancing data augmentation beyond Monte Carlo simulation; • Applying the methods to carbon fibre laminate plates with real impact damages; • More rigorous investigation of the influence of sensor density, sensor failure, and sensor variations on prediction results; • Implementing the distributed MTMP approach on a real sensor network with embedded lowresource computers.

Fig. 1 .
Fig. 1.Spatial vs. temporal dimensions of the sensor data and centralised vs. distributed sensor processing and ML.(Top) Centralised data sampling and one global model M (Bottom) Decentralised data sampling with local models μ

Fig. 3 .
Fig. 3. Experimental set-up for measuring Lamb wave propagation fields with an air coupled ultrasonic sensor and a scanning aperture (left) and position of actuator and pseudo damages (right), dimensions in mm

Fig. 4 .
Fig. 4. Intensity images of two-dimensional Lamb wave interaction with a pseudo damage (sealant tape) at a frequency of 80 kHz, wave propagation field at 233 μs (left) and at 330 μs (right) with highlighted region of phase shift and attenuation

Fig. 5 .
Fig. 5. Intensity images of two-dimensional Lamb wave interaction with pseudo damage (MMA) at a frequency of 80 kHz, wave propagation field at 230 μs (left) and at 326 μs (right) with highlighted regions showing mode conversion

Fig. 6 .
Fig. 6.(a) The time-resolved sensor signal (amplitude) can be divided into three segments with only the middle segment containing relevant damage features (b) Spatial dependencies of the spatial sensor-damage-actuator configuration with effect on wave propagation and damage detection

Fig. 7 . 2 . 3 .
Fig. 7. Measuring and data processing architecture computing a virtual sensor network with 8 × 8 nodes and a spatial sensor distance of 60mm from the original measured physical sensor data

Fig. 8 .
Fig. 8. Feature selection with DWT decomposition from time-resolved and discretised sensor signal s(n) as the input for the damage predictor function implemented by a recurrent ANN using LSTM cells (LSTM cell drawing from [18]).HP: High-pass filter, LP: Low-pass filter, ↓: Down sampling, ↑: Up sampling, o t : Output fate, f t : Forget gate, i t : Input gate, C t : Memory cell

Fig. 9 .
Fig. 9. Labelling of training data (assigning target output variable outcomes) by simple neighbourhood detection of damages in the range 2R around a sensor node centre position.Shown is a part of the sensor network (red colour: damage within radius R, blue colour: no damage)

7 ) 8 )
mass computation is applied to this point cloud interpolating the damage position.(Fully weighted Center of Mass.This algorithm creates a point cloud with all discrete node positions together with the weight derived from the predictor function output.Finally, a weighted centre of mass computation is p d (SN ) = point cloud interpolating the damage position.(Density-based Clustering and Center of Mass.

Fig. 11 .
Fig. 11.Examples of local predictor function activations (red colour, binary categorisation by a threshold function) for each sensor node for one base-line experiment and 9 experiments with different damage positions.Each square shows in the upper triangle the damage label marking and in the lower triangle the actual damage prediction from sensor data.

Fig. 12 .
Fig. 12. Accumulated prediction error statistics for all defect positions and all data series (21 pseudo defect positions and two mounting technologies VST/MMA) with MC simulation adding 10% multiplicative gaussian sensor noise [Label is defect position D < px > -< py > in mm].Five different damage localisation algorithms are compared applied to local prediction results.The grey lines indicate the standard deviation interval 2σ and minimal and maximal errors in a set

Fig. 13 .
Fig. 13.Examples of sensor network activation patterns with local false-positive activations (X: Damage position, grey dotted rectangle: cluster of false-positive activations).Shown is the raw numerical output in the range [0,1] of the two-dimensional prediction matrix of the sensor network with 8 × 8 nodes.

Fig. 15 .
Fig. 15.LSTM network architecture consisting of layers of LSTM cells: (Bottom) Encoder (TOP) Decoder; Time-unrolled and replicated presentation to illustrate the data flow on successive samples

Fig. 16 .
Fig. 16.Typical examples of auto-encoder-based damage feature extraction.Shown are intensity images of the AE processed DWT features (i.e., each pixel of the image has a binary value and yellow colour indicates a detected anomaly); x-and y-axis in pixel coordinates, defect index is numerated from left to right and top to bottom (totally 9 defect positions).

Fig. 17 .
Fig. 17.Spatial accuracy of the auto-encoder-based damage feature extraction with real damage positions markings (ground truth, blue cross) and the damage position approximated by DBSCAN (red circle); x-and y-axis in pixel coordinates.

Fig. 19 .
Fig. 19.Statistical analysis of the dependency of the position error on an artificial noise level added to the sensor signal.(a) Averaged over all data sets (b) Averaged over a specific data set.Single points are outliners.