Predictive Maintenance of Bus Fleet by Intelligent Smart Electronic Board Implementing Artiﬁcial Intelligence

: This paper is focused on the design and development of a smart and compact electronic control unit (ECU) for the monitoring of a bus ﬂeet. The ECU system is able to extract all vehicle data by the on-board diagnostics-(ODB)-II and SAE J1939 standards. The integrated system Internet of Things (IoT) system, is interconnected in the cloud by an artiﬁcial intelligence engine implementing multilayer perceptron artiﬁcial neural network (MLP-ANN) and is able to predict maintenance of each vehicle by classifying the driver behavior. The key performance indicator (KPI) of the driver behavior has been estimated by data mining k-means algorithm. The MLP-ANN model has been tested by means of a dataset found in literature by allowing the correct choice of the calculus parameters. A low means square error (MSE) of the order of 10 − 3 is checked thus proving the correct use of MLP-ANN. Based on the analysis of the results, are deﬁned methodologies of key performance indicators (KPIs), correlating driver behavior with the engine stress deﬁning the bus maintenance plan criteria. All the results are joined into a cloud platform showing ﬂeet e ﬃ ciency dashboards. The proposed topic has been developed within the framework of an industry research project collaborating with a company managing bus ﬂeet.


Introduction
The controller area network (CAN) bus and on-board diagnostics (ODB) communication interfaces II are standards typically used to extract information from a vehicle [1][2][3], to control the vehicle conditions, and to deduce any anomalies by accessing the electronic control unit (ECU). The CAN system is characterized by a relatively low cost per node when compared with other information systems in the automotive bus systems [4]. The CAN standard can be integrated with mobile information systems [5] and data mining algorithms such as artificial neural networks (ANN) [6]. In [7], some researchers have provided important indications about the procedures being performed for the diagnosis and prognosis of the vehicle starting from the analysis of the ECU data, thus suggesting adopting vehicle data for maintenance plan. In [8], the integrability of the sensors in a complete diagnostic network has been demonstrated, thus suggesting ideating a smart and compact unit adaptable to different types of vehicles by using specific connectors. In this scenario, ODB II connector has been adopted in [9,10] to acquire data for the fuel consumption trend, which could therefore indirectly provide indications on the analysis of drivers behavior and in a certain way is able to predict the vehicle wear (the fuel consumption is a parameter that is a function of the acceleration and of the vehicle speed). The analysis, mainly derived from the speed and consumption indications of a bus, can also be carried out by analyzing the data acquired by global positioning systems (GPS) [11][12][13][14][15][16][17][18][19], IoT 2020, 1 181 able to track in real time the vehicle route and to estimate vehicle speed. GPS technology can be integrated with global system for mobile (GSM) [11,12], providing additional support for data transmission, for anti-theft systems [13], and for signaling emergency situations or engine failures [14]. GPS tracking is strategic also for logistics [16]. Machine learning (ML) algorithms are good candidates to support data analysis, especially for engine failures prediction [20]. On the other side, data mining could provide decision-making in fleet management, estimating economical maintenance by using k-means algorithm [21]. Other unsupervised algorithms have been adopted for predictive maintenance in the automotive sector [22]. Moreover, bus surveillance systems by camera could provide further information about violators and traffic [23], thus supporting drivers. Recently, the use of Internet of Things (IoT) and microcontroller technologies enabled automatic systems predicting fleet health and maintenance [24]. A full architecture able to implement all the facilities fleet management, including the use of external sensors and microcontrollers, has been analyzed in [25]. Different frameworks have been proposed for decision-making processes in bus fleet maintenance [26]. Concerning driver behavior, clustering techniques are suitable to classify characteristics such as safe and ecological driving [27]. Driver behavior features can also be extracted by deep learning (DL) approaches analyzing road complexity [28]. Following the state-of-the-art, are formulated the main specifications of an industry project, concerning the development of a software and hardware platform oriented on predictive maintenance and driver behavior estimation. Specifically, the research project concerns the design and development of an engineered system for the acquisition of bus fleet data and for the management of their maintenance, using predictive analysis. The project also provides the monitoring of each individual bus vehicle by means of the GPS system and the integration of surveillance system. Specifically, the main specifications of project include: • the use of data acquisition interfaces (electronic interfaces) implementing on-board diagnostics II (OBD-II) communication standard diagnostics II (OBD-II) communication standard (the data are extracted from the control units by means of scheduled procedures); • the use of a central database (MySQL) for the collection of data, which are processed by data mining and artificial intelligence algorithms (e.g., clustering, artificial neural networks), supporting the formulation of the fleet maintenance plan, based on wear prediction deriving from the analysis of vehicle data such as revolutions per minute (RPM), accelerations (throttle position), stops, refueling, fuel consumed, inconsistencies between loaded values and actually consumed volumes, etc.; • the creation of dashboards indicating the wear levels of the single bus, and the predictive scheduling of the maintenance plan based on the outputs of the artificial intelligence algorithms; • a camera module: two night vision cameras are mounted on each vehicle (front and rear view) for video streaming and for recording. The cameras, in addition to the linking of the GPS signal, can be also used to verify the driving style of the driver, and to analyze the status of the itineraries that could influence the vehicle wear for a long time period (intense traffic conditions, roads with potholes, etc.); • a GPS monitoring module allowing the tracking of all the movements and activities of each vehicle and to monitor any inefficiencies (for example excessive consumption due to an inappropriate driving style, risky driving styles due to speed limits not respected, etc.). GPS data can be processed by the data mining engine for the definition of the driver's reliability and efficiency indices, for the mapping of the activities of each individual vehicle, and for the support in predicting maintenance procedures. Figure 1 illustrates the electronic architecture related to the data on acquisition on board: the ECU is connected to the ODB II port transferring data to a raspberry Pi board; the boards trough an internet key, transmit data and images to the cloud. Data and images are then processed by a server where an artificial intelligence engine is implemented. The artificial intelligence algorithms provide as output the driver key performance indicators (KPIs) and predictive maintenance procedures (see Figure 2). IoT 2019, 2 FOR PEER REVIEW 3 Figure 1. Architecture of the data acquisition on board system. The software design (artificial intelligence engine) integrates the following modules: • Multilayer perceptron (MLP) artificial neural network (ANN) model providing prediction about vehicle wear; • k-means algorithm able to provide driver clusters indicating the correct and inappropriate behaviors.
The outputs of both the algorithms are combined to update the predictive maintenance procedure.  The software design (artificial intelligence engine) integrates the following modules:

GPS
• Multilayer perceptron (MLP) artificial neural network (ANN) model providing prediction about vehicle wear; • k-means algorithm able to provide driver clusters indicating the correct and inappropriate behaviors.
The outputs of both the algorithms are combined to update the predictive maintenance procedure. The software design (artificial intelligence engine) integrates the following modules:

GPS
• Multilayer perceptron (MLP) artificial neural network (ANN) model providing prediction about vehicle wear; • k-means algorithm able to provide driver clusters indicating the correct and inappropriate behaviors.
The outputs of both the algorithms are combined to update the predictive maintenance procedure. The paper is structured by the following discussions: • Bus protocol description;

Methodological Approaches and Experiments
The ODB II communication standard is applied to the BUS Iveco Crossway. The used standards for bus communication are SAE J1962 (HW connector) and SAE J1939 (PGN) standards, useful for the design and development of the IOT system recovering the parameters from the ECU. The OBD port has been designed to communicate with different transmission protocols, and therefore with different ECU models, which over time have been replaced with CAN. The use of OBD-II technology allows direct access to the data of the engine control unit (ECU), by means of the SAE J1939 standard.
The type of frame used for the SAE J1939 standard is the extended one, which provides the 29-bit identification (ID) field given by the sum of two sub-fields: The maximum payload size is 8 bytes. The protocol is shown in Through the SAE J1939 standard, the messages are identified as "parameter group" or PG (group of parameters) corresponding to a set of quantities belonging to the same topic or subsystem. To each PG is assigned a "parameter group number", or PGN ("Parameter Group Number"), which identifies the same dataset of information.
Two types of PGN are possible: • PG global PGN: it identifies a group of PG parameters, which are sent to all devices or broadcast.
Here the Protocol Data Unit (PDU) format, PDU specific, data page and extended data page are used for the identification of the corresponding PG. Global PGNs occur when the PDU format value is greater than or equal to 240. In fact, the PDU specific corresponds to the group extension. The format of the PDU used for this data is the second.

•
Specific PGN: are parameter group PG transmitted to particular devices (peer-to-peer). Here the PDU format, data page and the extended page are used for the identification of the corresponding PG. As for the PDU Format, it assumes a value less than 240 and the specific PDU is set to zero. The format of the PDU used for this data is therefore the first.
More details about SAE J1939 protocol are provided in Appendix A.
Predictive maintenance algorithm has been implemented by multilayer perceptron neural networks (MLP) implemented by Konstanz Miner (KNIME) open source tool based on the use of graphical user interfaces (GUIs) as blocks enable data processing [29,30]. The MLP is a feed-forward artificial neural network (ANN) model that maps sets of input vehicle data by providing output wear prediction. The MLP network is constituted by multiple nodes linked in different layers. Each node behaves as a calculus cell named neuron able to process data by means of a properly defined activation function. The MLP approach implements the backpropagation training computing the gradient of the loss function with respect to the weights of the network for a single input-output sample.
The approach used for the predictive maintenance is the definition of the training and testing dataset following the scheme of Figure 3: a first data partition is used for the model training and the last sample is adopted for the model testing. The detected data samples are stored into a MySQL table where each record contains the attributes to process.
IoT 2019, 2 FOR PEER REVIEW 5 PG is assigned a "parameter group number", or PGN ("Parameter Group Number"), which identifies the same dataset of information. Two types of PGN are possible:  PG global PGN: it identifies a group of PG parameters, which are sent to all devices or broadcast. Here the Protocol Data Unit (PDU) format, PDU specific, data page and extended data page are used for the identification of the corresponding PG. Global PGNs occur when the PDU format value is greater than or equal to 240. In fact, the PDU specific corresponds to the group extension. The format of the PDU used for this data is the second.  Specific PGN: are parameter group PG transmitted to particular devices (peer-to-peer). Here the PDU format, data page and the extended page are used for the identification of the corresponding PG. As for the PDU Format, it assumes a value less than 240 and the specific PDU is set to zero. The format of the PDU used for this data is therefore the first.
More details about SAE J1939 protocol are provided in Appendix A.
Predictive maintenance algorithm has been implemented by multilayer perceptron neural networks (MLP) implemented by Konstanz Miner (KNIME) open source tool based on the use of graphical user interfaces (GUIs) as blocks enable data processing [29,30]. The MLP is a feed-forward artificial neural network (ANN) model that maps sets of input vehicle data by providing output wear prediction. The MLP network is constituted by multiple nodes linked in different layers. Each node behaves as a calculus cell named neuron able to process data by means of a properly defined activation function. The MLP approach implements the backpropagation training computing the gradient of the loss function with respect to the weights of the network for a single input-output sample.
The approach used for the predictive maintenance is the definition of the training and testing dataset following the scheme of Figure 3: a first data partition is used for the model training and the last sample is adopted for the model testing. The detected data samples are stored into a MySQL table where each record contains the attributes to process. The dataset partition used for the MLP-ANN calculus is: 80% of training and 20% of testing. This proportion provides the best model for the analyzed dataset (low calculus error value).  The dataset partition used for the MLP-ANN calculus is: 80% of training and 20% of testing. This proportion provides the best model for the analyzed dataset (low calculus error value).
The MLP model has been tested using the dataset found in [31], where the following attributes were identified:  IoT 2019, 2 FOR PEER REVIEW 6 The MLP model has been tested using the dataset found in [31], where the following attributes were identified:  GPS time (time acquired by the GPS module);  Device time (internal clock time of the device);  Longitude (longitude of the GPS coordinate);  Latitude (latitude of the GPS coordinate);  GPS speed (measured in meters/second representing the speed of the vehicle);  Horizontal dilution of precision (horizontal error on the GPS position);  Altitude (altitude acquired by the GPS module);  Bearing (the horizontal angle between the direction and the north);  G(x) (angular velocity (degree per second) on the X axis acquired with gyroscope);  G(y) (angular velocity (degree per second) on the Y axis acquired with gyroscope);  G(z) (angular velocity (degree per second) on the Z axis acquired with gyroscope);  G calibrated (gyro calibration error);  Engine coolant temperature (temperature in °C of coolant engine liquid);  Engine RPM (angular speed of the motor shaft expressed in rpm);  Intake air temperature (temperature of the air entering the combustion chamber expressed in °C);  Engine load % (percentage of the maximum power supplied by the engine);  Mass air flow rate (flow rate of the air flow entering the combustion chamber of the engine expressed in g/s);  Throttle position manifold % (percentage of the accelerator position pressed). Figure 4 shows the statistical plots of the dataset adopted for the performance check of the MLP algorithm.    The MLP network has been implemented by the KNIME workflow of Figure 5 structured as follows: IoT 2020, 1

-
A data source "CSV Reader" block loading the bus data into a local repository (data extracted from the MySQL database); -A data pre-processing filtering attributes to select for the data processing ("Column Filter" block); -A data pre-processing block normalizing all numerical data of the filtered dataset (predictive model markup language (PMML) normalizer "Normalizer (PMML)"); -A data pre-processing partitioning data for the training and testing processing; -For the training of the MLP is adopted the efficient RProp algorithm [32,33] ("RProp MLP Learner" block constituting the training dataflow); - The "MultiLayerPerceptronPredictor" block model the MLP neural network merging the training workflow with the testing one; - The numeric score provides the mean squared error (MSE) defined as: where y i is the measured value andӯ is the predicted one; the "Excel Writer (XLS)" block writes the scoring results in an excel file.
IoT 2019, 2 FOR PEER REVIEW 7 The MLP network has been implemented by the KNIME workflow of Figure 5 structured as follows: -A data source "CSV Reader" block loading the bus data into a local repository (data extracted from the MySQL database); -A data pre-processing filtering attributes to select for the data processing ("Column Filter" block); -A data pre-processing block normalizing all numerical data of the filtered dataset (predictive model markup language (PMML) normalizer "Normalizer (PMML)"); -A data pre-processing partitioning data for the training and testing processing; -For the training of the MLP is adopted the efficient RProp algorithm [32,33] ("RProp MLP Learner" block constituting the training dataflow); -The "MultiLayerPerceptronPredictor" block model the MLP neural network merging the training workflow with the testing one; -The numeric score provides the mean squared error (MSE) defined as: where yi is the measured value and ӯ is the predicted one; -the "Excel Writer (XLS)" block writes the scoring results in an excel file. For clustering results, indicating driver behavior has been applied to the k-means algorithm [34,35] using RapidMiner tool (see the related workflow in Figure 6). For the analysis of correlations between the variables, the correlation matrix algorithm of RapidMiner tool has been adopted [36] (see the related workflow in Figure 7).  For clustering results, indicating driver behavior has been applied to the k-means algorithm [34,35] using RapidMiner tool (see the related workflow in Figure 6). For the analysis of correlations between the variables, the correlation matrix algorithm of RapidMiner tool has been adopted [36] (see the related workflow in Figure 7).
IoT 2019, 2 FOR PEER REVIEW 7 The MLP network has been implemented by the KNIME workflow of Figure 5 structured as follows: -A data source "CSV Reader" block loading the bus data into a local repository (data extracted from the MySQL database); -A data pre-processing filtering attributes to select for the data processing ("Column Filter" block); -A data pre-processing block normalizing all numerical data of the filtered dataset (predictive model markup language (PMML) normalizer "Normalizer (PMML)"); -A data pre-processing partitioning data for the training and testing processing; -For the training of the MLP is adopted the efficient RProp algorithm [32,33] ("RProp MLP Learner" block constituting the training dataflow); -The "MultiLayerPerceptronPredictor" block model the MLP neural network merging the training workflow with the testing one; -The numeric score provides the mean squared error (MSE) defined as: where yi is the measured value and ӯ is the predicted one; -the "Excel Writer (XLS)" block writes the scoring results in an excel file. For clustering results, indicating driver behavior has been applied to the k-means algorithm [34,35] using RapidMiner tool (see the related workflow in Figure 6). For the analysis of correlations between the variables, the correlation matrix algorithm of RapidMiner tool has been adopted [36] (see the related workflow in Figure 7).

Testing and Results
The system represented in Figure 1 and Figure 2 has been implemented by performing the following preliminary tests:  Verification of the correct functioning of the accelerometer and the GPS module (this check also implies the correct connection with the Raspberry input pins);  Checking of the auto-starting operation with .desktop files;  Firmware testing;  Verification of solution validity using the 4G WiFi router;  Verification of server data receipt. Figure 8a shows the testing circuit system assembling the components of the architecture of Figure 1, additionally Figure 8b illustrates the photo concerning server linking. The Raspberry is powered by the cigarette lighter socket, thanks to the car adapter, which has two USB 5V sockets at the output. The preliminary tests for the accelerometer firmware are performed with the engine off, while the preliminary tests of the Bluetooth OBD reader are executed with the engine running. When the Raspberry board receiver is turned on, it connects the testing laptop with the WiFi router. The Raspberry board is remotely controlled through the remote desktop control application. The OBDrelated test of the reading script proved the detection of the vehicle data. The data flow is enabled through the Bluetooth OBD, by timing and synchronizing the Raspberry acquisition every ten seconds. The command r = requests.post (URL_ODB_BUS, json = payload) provides the following server connection response checking the correct data flow into the json package file:  In Figure 9 is illustrated a preliminary test of acceleration data acquisition.

Testing and Results
The system represented in Figures 1 and 2 has been implemented by performing the following preliminary tests: Verification of server data receipt. Figure 8a shows the testing circuit system assembling the components of the architecture of Figure 1, additionally Figure 8b illustrates the photo concerning server linking. The Raspberry is powered by the cigarette lighter socket, thanks to the car adapter, which has two USB 5V sockets at the output. The preliminary tests for the accelerometer firmware are performed with the engine off, while the preliminary tests of the Bluetooth OBD reader are executed with the engine running. When the Raspberry board receiver is turned on, it connects the testing laptop with the WiFi router. The Raspberry board is remotely controlled through the remote desktop control application. The OBD-related test of the reading script proved the detection of the vehicle data. The data flow is enabled through the Bluetooth OBD, by timing and synchronizing the Raspberry acquisition every ten seconds. The command r = requests.post (URL_ODB_BUS, json = payload) provides the following server connection response checking the correct data flow into the json package file:

Testing and Results
The system represented in Figure 1 and Figure 2 has been implemented by performing the following preliminary tests:  Verification of the correct functioning of the accelerometer and the GPS module (this check also implies the correct connection with the Raspberry input pins);  Checking of the auto-starting operation with .desktop files;  Firmware testing;  Verification of solution validity using the 4G WiFi router;  Verification of server data receipt. Figure 8a shows the testing circuit system assembling the components of the architecture of Figure 1, additionally Figure 8b illustrates the photo concerning server linking. The Raspberry is powered by the cigarette lighter socket, thanks to the car adapter, which has two USB 5V sockets at the output. The preliminary tests for the accelerometer firmware are performed with the engine off, while the preliminary tests of the Bluetooth OBD reader are executed with the engine running. When the Raspberry board receiver is turned on, it connects the testing laptop with the WiFi router. The Raspberry board is remotely controlled through the remote desktop control application. The OBDrelated test of the reading script proved the detection of the vehicle data. The data flow is enabled through the Bluetooth OBD, by timing and synchronizing the Raspberry acquisition every ten seconds. The command r = requests.post (URL_ODB_BUS, json = payload) provides the following server connection response checking the correct data flow into the json package file:  In Figure 9 is illustrated a preliminary test of acceleration data acquisition. In Figure 9 is illustrated a preliminary test of acceleration data acquisition.  The MLP model has been checked by obtaining the parameters listed in Table 2, indicating the number of the hidden layers, the neuron number for the hidden layers and the MSE: by considering the testing dataset, very low MSE values are obtained, in the order of 10 −2 . The MSE results delineate a good error trend versus the variation of the parameters, such as the number of hidden layers and number of neurons for the hidden layers thus proving the correct choice of the algorithm used for the prediction.  An example of application of the KNIME MLP network is illustrated in Figure 7, where it is possible to observe that the predicted engine power (engine load) is higher than the measured engine load values thus predicting an accentuate engine wear. Moreover, as expected, Figure 10 illustrates a close correlation between engine RPM values and engine power. The MLP model has been checked by obtaining the parameters listed in Table 2, indicating the number of the hidden layers, the neuron number for the hidden layers and the MSE: by considering the testing dataset, very low MSE values are obtained, in the order of 10 −2 . The MSE results delineate a good error trend versus the variation of the parameters, such as the number of hidden layers and number of neurons for the hidden layers thus proving the correct choice of the algorithm used for the prediction. An example of application of the KNIME MLP network is illustrated in Figure 7, where it is possible to observe that the predicted engine power (engine load) is higher than the measured engine load values thus predicting an accentuate engine wear. Moreover, as expected, Figure 10 illustrates a close correlation between engine RPM values and engine power. In order to provide information about driver behavior, we executed the k-means algorithm fixing as K = three the number of clusters (three main driver behavior). Figure 11 illustrates the clusters by grouping the GPS speed and engine RPM parameters: the cluster indicated by the orange color (cluster 0) is representative of drivers tending to travel at low velocities and by accelerating slowly (prudent driving behavior). The drivers of the cluster indicated by the green color (cluster 1) travel with low speed but forcing the engine (high average RPM engine values), thus denoting an inefficient driving style, which could accelerate the engine wear. The cluster represented by the blue color (cluster 2) denotes drivers that mainly contribute to the vehicle wear. Figure 11. k-means analysis: clusters grouped by GPS speed and engine RPM parameters, and linear regression trend. Figure 12 represents the same clusters and the linear regression trends, denoting that the cluster 0 is characterized by a low percentage of the accelerator pressed (throttle position), and a low engine power (engine load). The cluster 1 do not press excessively on the accelerator but the engine is forced, denoting that the gears of the vehicle are not often changed. The cluster 2 indicate the engine stress due to a strong pression of the accelerator. In order to provide information about driver behavior, we executed the k-means algorithm fixing as K = three the number of clusters (three main driver behavior). Figure 11 illustrates the clusters by grouping the GPS speed and engine RPM parameters: the cluster indicated by the orange color (cluster 0) is representative of drivers tending to travel at low velocities and by accelerating slowly (prudent driving behavior). The drivers of the cluster indicated by the green color (cluster 1) travel with low speed but forcing the engine (high average RPM engine values), thus denoting an inefficient driving style, which could accelerate the engine wear. The cluster represented by the blue color (cluster 2) denotes drivers that mainly contribute to the vehicle wear. In order to provide information about driver behavior, we executed the k-means algorithm fixing as K = three the number of clusters (three main driver behavior). Figure 11 illustrates the clusters by grouping the GPS speed and engine RPM parameters: the cluster indicated by the orange color (cluster 0) is representative of drivers tending to travel at low velocities and by accelerating slowly (prudent driving behavior). The drivers of the cluster indicated by the green color (cluster 1) travel with low speed but forcing the engine (high average RPM engine values), thus denoting an inefficient driving style, which could accelerate the engine wear. The cluster represented by the blue color (cluster 2) denotes drivers that mainly contribute to the vehicle wear. Figure 11. k-means analysis: clusters grouped by GPS speed and engine RPM parameters, and linear regression trend. Figure 12 represents the same clusters and the linear regression trends, denoting that the cluster 0 is characterized by a low percentage of the accelerator pressed (throttle position), and a low engine power (engine load). The cluster 1 do not press excessively on the accelerator but the engine is forced, denoting that the gears of the vehicle are not often changed. The cluster 2 indicate the engine stress due to a strong pression of the accelerator.  Figure 11. k-means analysis: clusters grouped by GPS speed and engine RPM parameters, and linear regression trend. Figure 12 represents the same clusters and the linear regression trends, denoting that the cluster 0 is characterized by a low percentage of the accelerator pressed (throttle position), and a low engine power (engine load). The cluster 1 do not press excessively on the accelerator but the engine is forced, denoting that the gears of the vehicle are not often changed. The cluster 2 indicate the engine stress due to a strong pression of the accelerator. The plot of Figure 13 confirms the correlation between engine load and engine RPM as deduced by the MLP analysis: a similar trend is observed in Figure 8.

Discussion
Starting with the MLP-ANN prediction of the engine stress, it is possible to re-plan the maintenance schedule of each vehicle. The standard predictive maintenance plan could change based The plot of Figure 13 confirms the correlation between engine load and engine RPM as deduced by the MLP analysis: a similar trend is observed in Figure 8. The plot of Figure 13 confirms the correlation between engine load and engine RPM as deduced by the MLP analysis: a similar trend is observed in Figure 8.

Discussion
Starting with the MLP-ANN prediction of the engine stress, it is possible to re-plan the maintenance schedule of each vehicle. The standard predictive maintenance plan could change based

Discussion
Starting with the MLP-ANN prediction of the engine stress, it is possible to re-plan the maintenance schedule of each vehicle. The standard predictive maintenance plan could change based on the MLP-ANN prediction of the engine stress, which is a function of the driver behavior: the planned period to perform bus maintenance can be anticipated by predicting a high engine wear. Figure 14 illustrates a theoretical plot merging this concept. on the MLP-ANN prediction of the engine stress, which is a function of the driver behavior: the planned period to perform bus maintenance can be anticipated by predicting a high engine wear. Figure 14 illustrates a theoretical plot merging this concept. The driver style and behavior are represented in three cluster typologies by analyzing the more significant parameters such as the GPS speed, the engine RPM, the engine load, and the throttle position. Each cluster can be associated with a score (average, low and high) as KPI concerning the driver velocity, the engine stress and the driver caution (see Table 3). The driver velocity and the engine stress are also representative of the fuel consumption: The low, average and high scores are denoted by red, orange and green color, respectively.
The same scoring of Table 3 is deduced mainly by the regression line slopes of Figure 7, Figure  8, and Figure 9. We observe that the predicted results and the KPI can be normalized to the unit, thus achieving an estimation scale (the results can be expressed in percentage). A full scenario is provided by the correlation matrix analysis, providing possible correlations between the most significant variables. Figure 15 reports the correlation matrix calculus enhancing the high correlation between throttle position and engine load and between throttle position and engine RPM, additionally a moderate correlation is observed between engine RPM and GPS speed, indicating the correct gear use of the drivers. The driver style and behavior are represented in three cluster typologies by analyzing the more significant parameters such as the GPS speed, the engine RPM, the engine load, and the throttle position. Each cluster can be associated with a score (average, low and high) as KPI concerning the driver velocity, the engine stress and the driver caution (see Table 3). The driver velocity and the engine stress are also representative of the fuel consumption: The low, average and high scores are denoted by red, orange and green color, respectively.
The same scoring of Table 3 is deduced mainly by the regression line slopes of Figure 7, Figure 8, and Figure 9. We observe that the predicted results and the KPI can be normalized to the unit, thus achieving an estimation scale (the results can be expressed in percentage). A full scenario is provided by the correlation matrix analysis, providing possible correlations between the most significant variables. Figure 15 reports the correlation matrix calculus enhancing the high correlation between throttle position and engine load and between throttle position and engine RPM, additionally a moderate correlation is observed between engine RPM and GPS speed, indicating the correct gear use of the drivers. The data are sampled every second. All data are grouped for a basic daily analysis. The k-means and MLP-ANN algorithms are also able to process all the collected data for monthly and yearly estimation, thus providing criteria for predictive maintenance.
The graphical dashboards can be automatized by inserting in the main workflow the delay blocks, timing the reading [37], and by simply substituting the reading CSV blocks with Python-based script, enabling the automated reading from the MySQL database trough web services [38]. All the  The data are sampled every second. All data are grouped for a basic daily analysis. The k-means and MLP-ANN algorithms are also able to process all the collected data for monthly and yearly estimation, thus providing criteria for predictive maintenance.
The graphical dashboards can be automatized by inserting in the main workflow the delay blocks, timing the reading [37], and by simply substituting the reading CSV blocks with Python-based script, enabling the automated reading from the MySQL database trough web services [38]. All the results are collected into database system linked to a cloud platform with dashboards.
The platform provides online monitoring of the KPI, vehicle health status, fuel consumption efficiency, and driver efficiency. Figure 16 illustrates the implemented dashboard. The thresholds expressed in percentage defining the low, high and average KPI are low: 0% ÷ 40%, average: 41% ÷ 60%, and high: 61% ÷ 100%. The data are sampled every second. All data are grouped for a basic daily analysis. The k-means and MLP-ANN algorithms are also able to process all the collected data for monthly and yearly estimation, thus providing criteria for predictive maintenance.
The graphical dashboards can be automatized by inserting in the main workflow the delay blocks, timing the reading [37], and by simply substituting the reading CSV blocks with Python-based script, enabling the automated reading from the MySQL database trough web services [38]. All the results are collected into database system linked to a cloud platform with dashboards.
The platform provides online monitoring of the KPI, vehicle health status, fuel consumption efficiency, and driver efficiency. Figure 16 illustrates the implemented dashboard. The thresholds expressed in percentage defining the low, high and average KPI are low: 0% ÷ 40%, average: 41% ÷ 60%, and high: 61% ÷ 100%. In Appendix A and in Appendix B are reported more details about SAE J1939 protocols and IVECO vehicle parameters, respectively.
Recent studies oriented the research in maintenance procedures by considering the programming approach [39], by classifying the state of the vehicles [40], or by applying artificial  In Appendix A and in Appendix B are reported more details about SAE J1939 protocols and IVECO vehicle parameters, respectively.
Recent studies oriented the research in maintenance procedures by considering the programming approach [39], by classifying the state of the vehicles [40], or by applying artificial intelligence algorithms for the predictive maintenance of the only engine part [41]. The proposed research is oriented on a new concept of predictive maintenance merging procedures, artificial intelligence prediction and KPI driver efficiencies, thus providing a methodology that takes into account multiple weighted factors potentially influencing vehicle maintenance.

Conclusions
The proposed work shows how it is possible to combine IoT devices detecting bus status with the data mining algorithms simultaneously estimating engine status prediction and driver behavior. The compact and implemented electronic architecture can be applied for each vehicle characterized by ODB-II and SAE J1939 standards. Data of vehicles are transmitted in the cloud to a data mining engine performing driver KPI by defining a score using k-means clustering analysis, and by predicting engine stress through MLP-ANN algorithms. The proposed data mining models have been tested mainly with a stable dataset by providing a low MSE error, thus confirming the model accuracy. The output of the data mining algorithms allowed the establishment of criteria for the predictive maintenance, thus anticipating the maintenance in cases of predicted engine stress due also to incorrect driver behavior. The bus fleet efficiency has been estimated by considering the engine stress prediction and the driver KPI. The efficiency parameters are stored into a database system and remotely visualized by dashboards. The perspectives of the proposed research are mainly oriented on the automatic management of the maintenance of a large number of vehicles, and on the possibility to choose dynamically the drivers according to the KPI evaluation. The followed scientific approach is able to combine the predictive maintenance procedure updated by wear prediction with the driver efficiency, balancing the assignment of vehicles and drivers. The adopted self-learning MLP-ANN network is stable and can be improved if a large number of vehicles and drivers will be assigned according to the project perspectives. The proposed electronic components are adaptable to different types of vehicles. The limitations of the on-board IoT solution are mainly due to few connections to dedicate to other sensors or measuring devices. In particular, the Raspberry board has an I2C port (to which the 3D accelerometer is connected), a serial universal asynchronous receiver-transmitter (UART) port (to which the GPS module is connected) and a serial peripheral interface (SPI) port. In order to overcome this limit, have been added the four USB ports, two of which are used for the OBD device and for the internet key. To have other connection points, it is necessary to insert other boards such as Raspberry and Arduino, which also have analog inputs.

Conflicts of Interest:
There are no conflicts of Interest.

Appendix A
As has already been said, with the SAE J1939 standard, data frames have an extended structure that has an extended 29-bit identification ID field. The latter is essential since it provides information on the type of PGN, on the priority of the same message, on the address of the device to which it is sending, and the intended recipient. In particular, we have the following structure: indicates the format of the data frame since its value varies the structure of the same frame. • PF < 239: the data frame refers to specific devices with their 8-bit address. If it is encoded as (0xFF) then it is transmitted to all connected devices, i.e., broadcast. • PF ≥ 240: the data frame corresponds to a broadcast message.
• PDU specific (PS) (8 bit): this field is specific to the data frame to be transmitted, as it varies in meaning according to the value encoded in the PF. • PF < 239: the PDU specific field corresponds to the address of the device to which you want to send the data frame. • PF ≥ 240: the PDU specific field becomes "Group Extension" to form the PGN of the transmitted PG. • Source address (SD) (8 bit): it is the address of the device that is transmitting the CAN data frame.
The following example shows how CAN data frames are communicated and implemented with the SAE J1939 standard. Suppose we want to know the thermal parameters of the engine of the vehicle in question, which can be a bus or a road tractor. The group of parameters to be referred to corresponds to the PGN (0x00FEEE) entered in document J1939-71 relating to the same standard.
First of all, it is appropriate to request this PG by sending the request data frame to which the PGN is associated with broadcast. The structure of the ID is as follows: The other fields of the data frame: • DLC: set to a value of three. • Data filed: the first three bytes correspond to the PGN you want to request, which in the case in question is (0x00FEEE).
Consequently, the device used, or the heavy vehicle engine ECU, sends a response to the request made with the related group of parameters (0x00FEEE). The structure of the ID is as follows:

Appendix B
Tables A4 and A5 explain the variables of the IVECO vehicle.

Variable Description
Total_Fuel_Cons By running a count for each service delivery, it is possible to monitor the total fuel consumption. Fuel_Rate It takes into account the driving style of the driver and any losses due to idling. Inst_Fuel_Eco High consumption indicates an incorrect driving style or the occurrence of a fault. Pos_Valve It allows to monitor the status of the engine fuel system. Fuel_Level It is closely linked to the autonomy of travel or to a possible loss of fuel. Engine_Hours It is useful for monitoring ordinary maintenance actions based on a time scale. Total_RPM It is useful for monitoring ordinary maintenance actions based on the RPM scale. Total_Distance It verifies the ordinary maintenance actions based on kilometric scale. Speed It is closely related to the driver's driving style. RPM A high value means more waste of fuel and an incorrect driving style. Percent_Torque It takes into account the performance of the engine with a view to safety and failures. Percent_Engine It decreases due to engine deterioration. Temp_Cool Index of correct engine operation for road safety and guide efficiency. Temp_fuel It is useful for fire risk estimation. Temp_Oil It is adopted to control the correct functioning of the motion transmission parts. Volt_Batt It is used to check possible outages following a continuous engine shutdown.

SPN
Index of any anomalies that may require roadside assistance.