This paper presents a monitoring framework to detect drifts and faults in the behavior of the central processing unit (CPU)-graphics processing unit (GPU) chips powering them. To construct the framework, an incremental model and a fault detection and isolation (FDI) algorithm are hereby proposed. The reference model is composed of a set of interconnected exchangeable subsystems that allows it to be adapted to changes in the structure of the system or operating modes, by replacing or extending its components. It estimates a set of variables characterizing the operating state of the chip from only two global inputs. Then, through analytical redundancy, the estimated variables are compared to the output of the system in the FDI module, which generates alarms in the presence of faults or drifts in the system. Furthermore, the interconnected nature of the model allows for the direct localization and isolation of any detected abnormalities. The implementation of the proposed framework requires no additional instrumentation as the used variables are measured by the system. Finally, we use multiple experimental setups for the validation of our approach and also proving that it can be applied to most of the existing embedded systems.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited