Performance Evaluation of rPPG Approaches with and without the Region-of-Interest Localization Step

: Traditionally, the ﬁrst step in physiological measurements based on remote photoplethysmography (rPPG) is localizing the region of interest (ROI) that contains a desired pulsatile information. Recently, approaches that do not require this step have been proposed. The purpose of this study was to evaluate the performance of selected approaches with and without ROI localization step in rPPG signal extraction. The Viola-Jones face detector and Kanade–Lucas–Tomasi tracker (VK) in combination with (a) a region-of-interest (ROI) cropping, (b) facial landmarks, (c) skin-color segmentation, and (d) skin detection based on maximization of mutual information and an approach without ROI localization step (Full Video Pulse (FVP)) were studied. Final rPPG signals were extracted using selected model-based and data-driven rPPG algorithms. The performance of the approaches was tested on three publicly available data sets offering compressed and uncompressed video recordings covering various scenarios. The success rates of pulse waveform signal extraction range from 88.37% (VK with skin-color segmentation) to 100% (FVP). In challenging scenarios (skin tone, lighting conditions, exercise), there were no statistically signiﬁcant differences between the studied approaches in terms of SNR. The best overall performance in terms of RMSE was achieved by a combination of VK with ROI cropping and the model-based rPPG algorithm. Results indicate that the selection of the ROI localization approach does not signiﬁcantly affect rPPG measurements if combined with a robust algorithm for rPPG signal extraction.


Introduction
Remote photoplethysmography (rPPG) is an optical measurement technique for detecting minute blood volume variations in cutaneous microcirculation using a digital camera [1]. It allows non-contact measurements of various physiological parameters: pulse rate (PR) and its variability, blood pressure, pulse transit time, etc.
Traditionally, rPPG approaches first require localization of the region of interest (ROI) [2,3]. The most basic solution is a manual definition of ROI [1,4]; however, the most commonly used are automated ROI localization approaches, such as (1) combination of face detection, tracking and skin mask refinement (the latter being optional) and (2) living-skin detection [5,6]. In the first approach, Viola-Jones frontal face detector has been suggested [7] for detecting faces in video recordings and has been widely used ever since in rPPG studies [8]. Detected ROI is then tracked throughout the entire recording using a dedicated algorithm, most commonly Kanade-Lucas-Tomasi (KLT) tracker. An optional step of skin mask refinement (i.e., ROI refinement) can be carried out in several ways, e.g., by resizing bounding box containing a detected face [7,9], by skin-color segmentation in various color spaces (RGB [10], RGB-H-CbCr [11], YCbCr [12]), by extracting selected facial landmarks [13,14], or by applying the skin detector based on maximization of mutual information [15]. The described automated approach is, however, only applicable in the case of facial video recordings, and its performance depends on the presence of motion artifacts on all frames and (2) landmarks detection on the first frame only in combination with (a) KLT tracker, (b) Circulant Structure with Kernels (CSK) tracker [28], and (c) Sum of Template and Pixel-Wise Learners (STAPLE) tracker [29]. Two different publicly available data sets (PURE [30], UBFC-RPPG [6]) and one private data set (Self-RPPG) were used and Plane-Orthogonal-To-Skin (POS) [31] algorithm was applied for final rPPG signal extraction. The best overall result (in terms of SNR, MAE, and RMSE) for PURE was achieved by landmarks detection on each frame, for UBFC-RPPG by an approach relying on KLT tracking, and for Self-RPPG by the approach relying on CSK tracking. Recently, Woyczyk et al. [32] indirectly studied the performance of the following ROI localization approaches: (1) detecting a face on the first frame using Viola-Jones frontal face detector and keeping the position of the bounding box fixed throughout the entire recording, (2) detecting a face on the first frame using Viola-Jones frontal face detector and then tracking it using KLT tracker, and (3) the same approach as the previous one with an addition of statistical skin classifier proposed by Jones and Rehg [33]. Their performance in terms of accuracy and MAE was tested on two data sets containing a total of 1000 ten-seconds-long, lossy compressed video recordings recorded under challenging illumination conditions. The authors reported that, in the case of using the green channel signal as the final pulse waveform signal, the best overall result was achieved by the first approach (accuracydefined as the ratio between the number of true positives [i.e., PRs within the range of ±5 BPM around the reference PR] -of 0.1709 and MAE of 23.53 BPM), in the case of applying CHROM [21] rPPG algorithm, by the third approach (accuracy of 0.3411 and MAE of 16.77 BPM) and in the case of applying POS [31] algorithm, by the second approach (accuracy of 0.3455 and 15.95 BPM).
The purpose of this work is to study the effect of approaches with and without the ROI localization step (i.e., with and without the image processing front-end), on the (1) success rate of extracting rPPG signals from video recordings, (2) SNR and (3) RMSE of average PRs estimated from video recordings. Performance evaluation is to be carried out on the publicly available data sets offering uncompressed and compressed (both lossy and lossless) videos covering various scenarios. Extraction of final rPPG signals will be extracted using two state-of-the-art rPPG algorithms relying on distinct principles and differing in sensitivity to the presence of non-skin pixels within ROI: model-based POS [31] and non-model-based (i.e., data-driven) Spatial Subspace Rotation (2SR) [34].
PURE [30] consists of 59 lossless compressed one-minute-long facial video recordings (recorded at 30 fps and 640 × 480 pixels per frame) of ten subjects (eight males and two females) in sitting position during six different controlled scenarios (resting with no intended head motion, talking, slow head translation of 7% of the face height per second, fast head translation of 14% of the face height per second, small head rotation of approximately 20 deg, and large head rotation of approximately 35 deg) and reference pulse waveform signals recorded with pulse oximeter at the sampling rate of 60 Hz. We used all video recordings from the data set and divided them into four groups: resting (10 recordings), talking (9 recordings), head translation (20 recordings), and head rotation (20 recordings).
LGI-PPGI-FVD [36] consists of 100 uncompressed video recordings (recorded at 25 fps and 640 × 480 pixels per frame) of 25 subjects (20 males and 5 females) during four different sessions (resting with no intended head motion, head and facial motion, exercising on a cycle ergometer in a gym with no set restrictions, and talking in a real-world urban environment) together with reference pulse waveform signals recorded with a pulse oximeter at the sampling rate of 60 Hz. Gym recordings are five minutes long, whereas the length of other recordings is approximately one minute. When it comes to actual public availability of the data, only 24 recordings of six subjects are available. We used only the recordings covering resting (six recordings) and gym (six recordings) scenarios. By selecting the gym recordings, we covered a challenging scenario of uncontrolled motion in a real-world environment (uncontrolled lighting, presence of multiple subjects), which is not present in the other two data sets.
PBDTrPPG consists of lossy compressed (H.264 codec) facial video recordings (recorded at 30 fps and 1080 × 1920 pixels per frame) of three male subjects during three challenging scenarios (lighting variation in combination with different skin tones, motion scenarios, and resting after exercise, causing significant changes in PR) and reference ECG recordings (recorded at 1024 Hz). Only videos covering five different lighting conditions and three different skin tones (total of 15 recordings; two additional lighting scenarios with only one recording per each scenario were excluded) were included in our study from this data set. We excluded motion and resting after exercise scenarios because there are only three and one video(s) available, respectively. In addition, the type of motion scenarios is equivalent to that from the PURE data set, so inclusion of these recordings would not represent a major contribution to our study.
With the selected data sets we included uncompressed and compressed videos, recordings made in controlled and uncontrolled environments and covering several different scenarios (rest, talking, different types of controlled and uncontrolled motion) and challenges (lighting variation and various skin tones).

Studied Approaches with and without ROI Localization Step
We applied five different approaches for PR extraction from video recordings-four with the ROI localization step and one without it: • Viola-Jones [26]  The basis for all approaches with an image processing front-end is a combination of Viola-Jones frontal face detector and KLT tracker due to its predominant application in rPPG studies. Viola-Jones frontal face detector was applied on the first frame of each recording. In the case more than one face was detected, the largest bounding box was selected for further processing. From this step onwards, the methodology split in separate directions:

•
In the VK approach, we reduced the width of the detected ROI to 60% of its original width [7].

•
In VK-LMK we identified facial landmarks within the original (unresized) ROI containing a detected face using the Discriminative Response Map Fitting approach [13]. From the identified 66 landmarks, we used nine of them to define new ROI, which covered cheeks, nose, and mouth area [38].

•
In VK-RGBHCbCr and VK-Conaire, we kept the original size of the detected ROI.
The KLT tracker was then initialized by identifying feature points within the original bounding box containing the detected face using Good Features to Track method [37] and then propagated throughout the entire recording. For a current frame, the tracker attempted to find the points from the previous frame and then estimate a transformation matrix consisting of the affinity parameters between the old points from the previous frame and new points from the current frame. Estimation of the transformation matrix was done using MSAC algorithm [39], a type of RANSAC algorithm, which is, in general, used for robust estimation of parameters of a selected mathematical model. Once the transformation matrix was defined, it was used to transform the edges of the ROIs defined in the list above from a previous frame to a current one. In the case of VK and VK-LMK, the newly transformed ROI represented the final ROI that was used for further processing, whereas in the case of VK-RGBHCbCr and VK-Conaire, the ROI was refined using RGB-H-CbCr skin color segmentation model [11], and skin detector based on the maximization of mutual information [15] were applied, respectively. The presented steps were applied on each of the following frames. The only studied approach without the ROI localization step was FVP [16]. Its implementation followed that of its authors [16], with an exception that we did not remove non-pulsatile components from rPPG signals in order to ensure easier comparison with other studied approaches. The selection of the studied approaches with image processing front-end was based on the frequency of their use in research-Zhao et al. reported that the most cited detection and tracking algorithm in rPPG research are Viola-Jones face detector, facial landmarks, KLT, and skin detection [8].

Applied rPPG Algorithms for Pulse Waveform Signal Extraction
We selected two state-of-the-art rPPG algorithms, POS [31] and 2SR [34], to extract final pulse waveform signals from the information extracted from video recordings using the approaches presented in the previous subsection. In POS, raw RGB signals extracted from ROIs on each frame are projected to a plane that is orthogonal to temporally normalized skin and, therefore, exhibits large pulsatile and low specular strength [31]. Since POS is built on a physiological reasoning, it works well also when skin mask is noisy. In 2SR, spatial subspace of the pixels within the ROIs is defined, and the measurement of temporal rotation of the defined space is used for extracting a pulse waveform signal. In contrast to POS, the quality of SNR drops significantly when the percentage of non-skin-pixels falls within 10 to 30%. The reasoning behind the selection of the presented algorithms was based on their sensitivity to skin mask noise and their performance in comparison to other state-of-the-art rPPG algorithms [31,34].

Performance Evaluation of the Studied Approaches with and without ROI Localization Step
Performance of the studied approaches was assessed by three quality metrics: (1) success rate, (2) modified SNR metric proposed by de Haan and Jeanne [21], and (3) RMSE of the difference between reference-and rPPG-derived PRs.
Success rate was defined as a percentage of successfully processed video recordings among the total number of recordings for each studied approach. Whenever an approach failed (regardless of the processing stage at which a failure occurred), further processing was stopped and the processing of a given video recording was marked as unsuccessful.
SNR was calculated as the ratio between the energy within 5 bins around the first harmonic (or fundamental frequency) plus 11 bins around the second harmonic and the remaining energy within the frequency band of expected PRs in humans (i.e., [30,240] BPM) (de Haan and Jeanne [21] took 10 bins around the second harmonic, but it is unclear, how the authors positioned this window relative to the second harmonic): (1) where f denotes the PR frequency (expressed in BPM), w t the applied binary window and S( f ) the power spectrum of rPPG pulse waveform signal. Fundamental frequency was defined as the peak frequency within the defined frequency band in the power spectrum of reference pulse waveform signals (obtained by Fast Fourier Transform). In the case a window around the second harmonic fell outside the set PR band, we extended the spectrum accordingly (this is the second modification of the SNR metric proposed by de Haan and Jeanne [21] that we applied).
Before calculating SNR, the reference signals were pre-processed by (1) applying first order Butterworth bandpass filter with lower and upper cut-off frequencies corresponding to 40 and 180 BPM, respectively, and (2) resampling to the sampling rate that matched the frame rate of the accompanying video recordings. The reason for applying bandpass filter with cut-off frequencies different from the upper-and lower frequencies in the frequency band corresponding to expected PRs in humans is the fact that we adopted the values of the actual data in order to achieve cleaner reference signals. SNR values were calculated for windowed signals (512-frame-long sliding window was applied). The results are visually presented in the form of boxplots if the size of each group within the studied scenario is larger than or equal to five, whereas strip plots are used otherwise.
RMSE was defined as the square root of the average of squared differences between the estimated (PR rPPG i ) and actual, i.e., reference, PR values (PR ref i ): where n denotes the total number of estimated PRs. PR values were measured using a sliding window of 512 frames; we took into the account that, in general, a PR is not constant throughout the entire recording, not even in a resting condition. Statistical evaluation of the results was carried out using one-way analysis of variance (ANOVA). ANOVA was used in the cases when number of samples in each studied group was larger than or equal to five.
We set the length of the sliding processing window to 128 frames for POS, 60 frames for 2SR and 128 frames for the FVP approach. The selection of the window lengths are partially adjusted to each algorithm to ensure their reliable performance: FVP tends to perform worse at shorter window lengths and is more sensitive to window size if compared to the approaches with image processing front-end [16]; in 2SR, optimal window length should include at least a half of the cardiac cycle, so its selection depends on the camera frame-rate and PR of the subject [34]; in POS, similarly as in 2SR, window length should capture at least one cardiac cycle (shorter window lengths are preferred so that instantaneous distortions can be suppressed as quickly as possible [31]).

Results
Studied approaches failed several times due to the following reasons: (1) no face was detected on the first frame of the recording, (2) KLT tracker lost all the feature points, (3) no skin pixels were detected using the RGB-H-CbCr skin-color detector, and (4) no skin pixels were detected using skin detector based on maximization of mutual information. Table 1 lists the recordings in which one of the presented issues occurred; it can be seen that Viola-Jones frontal face detector failed in some gym recordings, KLT tracker failed (i.e., lost all tracked points) in one gym recording (in this particular recording, the subject was moving his head very fast and at some point his face was occluded), skin-color-based segmentation failed in at least one of the videos from each studied data set, and the skin detector based on maximization of mutual information failed in one gym recording only. The reasoning behind unsuccessful skin detection is due to the limited training set applied for determining the threshold values in both skin detection algorithms, whereas in the case of Viola-Jones frontal face detector failure, the appearance of faces on the first frames was the reason for unsuccessful detection. For each occurrence of the presented issues, we discarded the obtained results and marked the processing of the particular video recording as unsuccessful. Based on the number of unsuccessfully processed recordings, we defined the success rates of the studied approaches: 88.37% of VK-RGBHCbCr, 94.19% of VK-Conaire, 95.35% of VK and VK-LMK, and 100% of FVP. Figure 1 shows the examples of successfully localized ROIs for a selected recording from each of the studied data sets using four different approaches with an imageprocessing front-end.  Table 2. Mean absolute ROI sizes are, in general, the largest for videos from LGI-PPGI-FVD and the smallest for videos from PURE (differences in sizes are more than two-fold); mean relative ROI sizes range from 4.9 to 13.2% of the total frame size for LGI-PPGI-FVD recordings, from 3.0 to 6.5% for PURE and from 0.6 to 1.4% for PBDT-rPPG. The differences arise from different distances between the subjects and the camera, as well as different video frame sizes between the three studied data sets. Note that FVP does not rely on a common ROI localization step; therefore it is not included in Table 2.  Figure 2 shows the performance of studied approaches in combination with two rPPG algorithms (POS and 2SR) in extracting rPPG signals from video recordings from PURE in terms of SNR.  Figure 3 shows the SNR results of the studied approaches tested on PBDTrPPG.  LGI-PPGI-FVD, FVP exhibits the widest interquartile range. Figure 4 shows the performance of the studied methods for extracting rPPG signals from video recordings from LGI-PPGI-FVD in terms of SNR. There are no significant differences between the SNR values among the studied approaches in case of resting scenario, whereas in the case of gym recordings, FVP performed worse than the other approaches. However, it should be noted that in the case of the approaches with the image-processing front-end, four of the video recordings were not processed successfully. The SNR values of the rPPG signals extracted from these recordings are the ones that lower the mean SNR value for FVP. Table 3 shows the results of the RMSE analysis. The results follow those of SNR, but the differences between the studied approaches seem to be more prominent. On average, the best performance is achieved by a combination of the VK approach and the SB rPPG algorithm. As expected, the RMSE is the largest in the case of low lighting conditions (LC1) and dark skin tone and the lowest in the case of resting conditions regardless of the applied approach. Similarly as in the case of SNR values, the worst performance was achieved by FVP.

Discussion
FVP is the only approach that allowed continuous PR measurement in all studied video recordings. It would, however, fail in the case of video recordings of multiple subjects, and its performance would deteriorate in the case of significant background changes. In other approaches, face detector algorithm or KLT tracker failed in at least one of the video recordings. In general, approaches relying on Viola-Jones frontal face detector would provide false positive results in the case of mannequins and fail completely if body parts other than face were covered on video recordings.
Mean SNR and RMSE values indicate that, in most of the recording scenarios, the performance of the studied approaches is comparable. This means that (1) the obtained ROIs were large enough to cancel out the camera quantization noise and (2) the effect of non-skin pixels did not significantly affect the performance of the studied approaches. When comparing the results of the approaches combined with POS with those combined with 2SR, the POS-based solutions seem to outperform the 2SR-ones, especially in the case of more challenging scenarios. This confirms the innate sensitivity of 2SR to noisy skin mask.
In addition, there are some observable differences between the studied approaches in some recording scenarios. In the talking-scenario recordings from PURE (Figure 2), the performance of all the approaches decreases in comparison to resting scenario due to noisier ROIs and varying intensities of specular and diffuse reflection components due to the motion of the facial pixels during talking. The worst performance is achieved by VK-LMK-2SR due to the largest ratio of non-skin to skin pixels (ROIs in VK-LMK were the smallest out of all ROIs), which arises when the mouth is opened during talking. A similar result is achieved in the head rotation scenario, in which we can also observe that 2SR performs worse than POS. In the case of PBDTrPPG (see Figure 3), which contains lossy compressed video recordings, average SNRs are the lowest out of all studied data sets. The achieved SNR values are comparable, and the selection of the approach with or without an image processing front-end together with an rPPG algorithm does not affect SNR value significantly. This is most likely due to partial loss of pulsatile information or introduction of additional artifacts into the recordings, as sugggested by Wang et al. [31]. Results show that SNR values increase as light intensity increases (Figure 3a) and decreases as skin tone decreases (Figure 3b), which is expected based on the knowledge underlying the formation of rPPG signal. It should be noted that results for different skin tones are not directly comparable, because of the variable distance between subjects and a camera. In addition, intermediate and dark skin tone would, based on our perception, lie close to each other on the Fitzpatrick scale, which could explain similar SNR values achieved. Interestingly, the best performance in case of a dark skin tone was achieved by FVP, which might be due to the fact that it relies not only on a mean but also on a variance for spatial pixel combination [31]. Lastly, the results of SNR values for LGI-PPGI-FVD (Figure 4) indicate similar performance of studied approaches in both resting and gym scenarios (except for FVP at gym recordings). In the case of the gym scenario, average SNR values are greatly reduced and only up to two video recordings from the set were successfully processed using the studied approaches with image processing front-end, whereas FVP allowed processing of all videos. It is to be noted that individual SNR values that lie below the mean SNR value for FVP correspond to the videos in which other approaches failed. If we removed these results, the SNR values of FVP would be comparable to those of the other approaches. Provided that the relative ROI sizes for recordings from LGI-PPGI-FVD were the largest, unevenly distributed pulsations [40] might also affect the results.
Our SNR values of the rPPG signals are lower in comparison with those provided by Zhao et al. [8], whereas RMSE values are lower in case of resting, talking and head translation scenarios. The authors studied the performance of several tracking algorithms (tracked ROI was a rectangular bounding box, corners of which were defined by selected facial landmark feature points). The results potentially indicate differences between the approaches for ROI localization, while it is important to emphasize that differences may occur due to different calculations of SNR. Although most of the studies rely on the SNR metric proposed by de Haan and Jeanne [21], there are differences in its actual calculation. Zhao et al. [8], for example, chose a frequency band of [48,300] min −1 (instead of the proposed [30,240] min −1 ), included the energy around the third harmonic in the energy spectrum of the pulse signal (in the original implementation, only the energy around the first and the second harmonic are included), and used the same spectral window length for calculating energies around the first, second, and third harmonic (in [21], 5-and 10-bins-long windows around the first and second harmonic were used, respectively). In the case of calculating RMSE, the length of the processing window plays an important role when interpreting the results (differences of up to 10% may occur for the window lengths from 0 to 60 s [41]). The described issue results from the already exposed drawback of the rPPG research, i.e., the lack of standardized methodology [42]. There were even some attempts to define standardized report procedures for assessing rPPG PR measurements [43] with the goal to ensure direct comparison of the results of different studies, but there seems to be no general agreement on this issue. Compared to other similar studies, the worst performance in terms of SNR and RMSE is achieved by the Viola-Jones face detector with KLT tracker [18,20], which indicates that a simple original ROI width reduction applied in our study does improve the skin mask. However, on the other hand, an approach similar to ours was applied by Fouad et al. [19], but again, the obtained RMSE was more than two times lower than in other studied approaches.
The key advantages of our study are (1) the inclusion of lossy compressed, lossless compressed, and uncompressed videos from publicly available data sets covering various recording scenarios and (2) the inclusion of an approach without the image-processing front-end. These advantages also define the innovative part of our study: the proposed methodology for assessing the performance of the approaches with and without the ROI localization step. The methodology includes (1) publicly available data video recordings of various quality of subjects in different challenging conditions and (2) the inclusion of the approach without the image processing front-end, which has, to the best of our knowledge, never been used before in the evaluation of the performances of the proposed ROI localization steps.
The disadvantages are related to (1) the limited number of analysed approaches, (2) the absence of time complexity analysis of the studied approaches, and (3) the small number of studied video recordings covering motion and skin tone scenarios. These disadvantages are due to the fact that (1) we wanted to focus on the most widely used approaches (with an addition of the less known FVP approach, which relies on a completely different principle), (2) time complexity analysis would first require optimization of the programming code behind the studied approaches, and (3) the remaining 19 sets of video recordings from LGI-PPGI-FVD are not publicly available due to the limited storage space on the server that data set's creators use, whereas PBDT-rPPG offers only a small number of videos covering the skin tone challenge. It is also to be noted that the success rate and the ROI size are the only parameters that directly assessed the performance of the studied approaches. However, even ROI sizes themselves do no tell anything about the quality of the skin mask (in terms of the ratio between the skin and non-skin pixels). In SNR and RMSE metrics, potential differences between the studied approaches are masked especially in the case of POS, which has been shown to exhibit the best overall performance [31] in extracting the rPPG signal from video recordings.
The first future challenge is the optimization of the studied algorithms to allow their best performance. For example, (1) in VK-RGBHCbCr and especially in VK-Conaire, a combination of a sequence of morphological operations (erosion and dilation) could be used to further refine the ROIs (see Figure 1d,h,l for noisy ROIs identified by VK-Conaire); (2) in all approaches with image processing front-ends, identified ROIs could be divided into multiple ROIs (multiple ROIs would also enable measurements of additional physiological parameters [44]); and (3) in all approaches, lengths of the processing window lengths could be optimized. Additionally, we could test some other skin classification approaches, such as one class support vector machine-based classifier [45] or an approach based on active contours and Gaussian mixture models [32], which do not rely on simple pixel-wise segmentation. It is to be noted that both methods require robust face detection. The second challenge is related to the analysis of the performance of the studied approaches in cases when subjects are wearing masks. Due to the fact that masks are becoming a new reality of our everyday life amid the ongoing COVID-19 pandemic, this challenge is in our opinion relevant for the rPPG research community. Recently, Speth et al. [46] created a publicly available data set containing recordings of 61 masked subjects together with an algorithm that adds a synthetic mask to a face on a selected video recording, which is, in our opinion, a valuable contribution for the rPPG community. The last challenge is related to the evaluation of various approaches for extracting rPPG signals from video recordings covering parts of the body other than face. These body parts have already been used for assessing peripheral hemodynamics [47,48].

Conclusions
In rPPG measurements, the selection of an approach for extracting the pulsatile information from video recordings seems not to significantly affect the extraction of rPPG signal from video recording in terms of SNR if proper rPPG algorithm that combines the information extracted from the video recording into a single pulse waveform signal is selected. On the other hand, RMSE and especially the success rate of the approaches are more affected by the selection of ROI localization approach. Therefore, when designing software for rPPG measurement system, one should adopt the software solution to the actual application to ensure as robust performance of the rPPG measurement system as possible.
Funding: This research was co-funded by Slovenian Research Agency (ARRS) (ARRS Programme code P2-0270 (C)). The APC was funded by ARRS.
Data Availability Statement: All data used in our study are made publicly available by their owners.

Acknowledgments:
The authors acknowledge the financial support from ARRS.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscrip; or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: