Remote Monitoring of Vital Signs in Diverse Non-Clinical and Clinical Scenarios Using Computer Vision Systems: A Review

.


Introduction
The monitoring of human vital signs, for example respiratory rate (RR), blood oxygen saturation (SpO 2 ), heart rate (HR), heart rate variability (HRV) and blood pressure (BP) plays a significant role in modern clinical care of patients in hospitals and at home [1].Applications include medical diagnosis, training programs, fitness assessment, lie detection and stress measurement [2].There are various instruments for measuring these vital signs, such as electrocardiograms (ECGs), pulse oximeters, nasal thermocouples, respiratory belt transducers and piezoelectric transducers [3].These instruments require direct physical contact with the human body as they use contact-based, sensor modalities, straps, probes or electrodes [4].These instruments may cause skin infection, injury, or harmful reactions on patients especially premature babies, aged people or burns victims who have fragile skin [5].Moreover, there is a risk of entanglement or strangulation of infants who are attached to monitors by means of wires and leads [6].These instruments are also not appropriate for long term monitoring as they may cause discomfort, irritation and a cumulative risk of fungal and bacterial infection [7].In addition, a reduced amplitude of chest wall expansion can affect the respiratory rate (RR) input signal from the impedance lead [8].Furthermore, cost is an important issue as the monitoring electrodes and leads are only intended and certified for a single use, followed by disposal [9].Placement of the sensors with self-adhesive pads leads to difficulties with wet, oily, dirty or hairy subjects which is a limitation of these technologies in emergency situations [10].Accuracy is another issue with conventional contact methods since they are sensitive to artefacts produced by the subject's movement [11].Therefore, to minimise these limitations, there is a need for an alternative method where vital signs can be measured without any physical contact.
As presented in Figure 1, there are several noncontact means based on magnetic induction, the Doppler effect, thermal imaging and video camera imaging which can be an effective alternative means of monitoring vital signs with acceptable reliability and accuracy [12].These methods depend on the observation of physical and physiological variations including skin colour, temperature, impedance changes, head motion, arterial pulse motion, and importantly, thoracic and abdominal motion due to the activity of both the respiratory and cardiovascular systems.Magnetic induction-based methods can detect the impedance changes caused by blood and air volume variations due to the mechanical action of the heart, diaphragm and thorax.The basic principle is to induce eddy currents in the tissue and to calculate the re-inducted magnetic field externally; the impedance changes can then be observed remotely to extract vital signs [13].The method uses a simple arrangement based on multiple coils [14] or a single coil [15] integrated into a mattress [16], bed [17] or seat [18].However, the method is highly susceptible to relative movements between coil and body.The Doppler effect is an active noncontact method that can detect subtle chest movements due to cardiorespiratory activity.In this method, vital signs are extracted using a Doppler radar [19,20], or laser sensors [21] as well as digital signal processing (DSP) techniques [22] where the phase shift between the transmitted waves and the reflected received waves from a region of interest (ROI) are calculated.There are three types of Doppler-based methods-Doppler with electromagnetics [23,24], lasers [25,26], and ultrasonics [27][28][29].
Thermal imaging [30,31] is a passive noncontact method that can detect the radiation emitted from particular parts of the human body in the infrared (IR) range of the electromagnetic spectrum to measure the physiological signal using a thermal camera [32][33][34].Thermal imaging-based methods extract vital signs by measuring temperature changes around the nostril area [35][36][37][38] as well as heat differences due to pulsating blood flow in the main superficial arteries at various regions such as the carotid artery in the neck [39,40] and temporal artery in the forehead [41,42].However, both Dopplerand thermal imaging-based approaches are susceptible to noise and motion artefacts and constrain the movement of the subjects due to the high cost of the sensor, preventing saturation sampling of the environment.Their relatively low resolution limits the detection range and specificity to one subject.Moreover, these methods need exposed ROI and specialized hardware, making them costly [4].They are also constrained to short-term monitoring and monitoring a single subject at a time.Additionally, Doppler-based methods may have biological effects on humans [43] with unknown future population risks if broadly adopted.
Digital cameras offer high resolution, in spatial (number of pixels per degree), temporal (number of frames per second), intensity (number of bits per pixel) and in spectrum (at least 3 visible channels, with hyperspectral options increasingly common), all due to consumer market demand.Furthermore, a large base of research assets exist for processing imagery, much of it free for use, for example; OpenCV [44].Flexibility with visible light optical design, offering panoramic, microscopic and telescopic solutions in well integrated commercial product families allows diverse measurement scenarios.Tailored fields of view allow analysis of multiple ROIs in parallel, or in series based on availability.The mass market has led to low cost [4,12] and affordable optics that can be used in almost any conceivable application scenario [9].
Video camera imaging is a passive contactless method where video cameras are used to extract different physiological signals from several regions of the human body, exploiting two principles.The first principle relies on skin colour variations due to cardiorespiratory activity, photoplethysmography (PPG).Vital signs are measured by exploiting variations in the reflectance properties of human skin from video, which causes variation in brightness values in sequences of images.The second principle depends on cyclic body motion owing to cardiorespiratory activity in techniques that can be broadly characterised as motion-based methods.The motion in the regions of the head, arterial pulse, and thoracic and abdominal region are included in this method.For noncontact physiological assessments, camera imaging based methods seem to be a promising approach since they are robust, reliable, safe, cost-effective, suitable for long distance and long-term monitoring as well as multiple person detection simultaneously [12].
Camera imaging-based methods have been attracting increasing attention in the literature.This paper aims to explore the progress of video camera imaging-based technology from controlled clinical scenarios with fixed monitoring installations and controlled lighting, towards uncontrolled environments, crowds and moving sensor platforms.We focus on the diversity of applications and scenarios being studied in this topic.We emphasise visible light sensing since these cameras represent the largest installed base, the lowest costs, the highest rate of improvement and the greatest opportunity to insert new capability into existing devices.First, we discuss studies of motion and colour-based methods.Then, we discuss the considerations and scenarios appropriate to colour-based methods, for example, in the presence of motion artefacts, illumination variation, different sensors, different subjects, different vital signs, multiple ROIs, long distance and multiple persons.Additionally, potential application of iPPG in both clinical and non-clinical sectors are described.We then consider research gaps and challenges of existing studies that may inform researchers who wish to further progress the techniques and applications.
Several review papers have been published in recent years based on video camera imaging.McDuff et al. [45] presented a review of the work on remote PPG imaging using digital cameras.Sun et al. [7] introduced PPG measurement techniques from contact to noncontact and from points to images.Sikdar et al. [46] did a methodological review for contactless vision-guided pulse rate estimation.Hassan et al. [47] investigated both iPPG and ballistocardiography (BCG) estimation based on digital cameras.Rouast et al. [5] did a technical literature review on remote heart rate measurement using low cost RGB face video.Al-Naji et al. [12] provided a broad literature survey of remote cardiorespiratory monitoring, including the Doppler effect, thermal imaging, and video camera imaging.Zausender et al. [48] did a review on the technique's background for cardiovascular assessment using iPPG.A thorough review on iPPG in diverse scenarios is appropriate, including issues such as motion artefacts, illumination variations, alternate sensor modalities, different subjects, different vital signs, multiple ROIs, long distance and multiple persons.Moreover, there is no review paper that focuses on applications of iPPG considering both clinical and non-clinical sectors.
To fill these gaps, this paper provides a comprehensive review of the recent advances of iPPG studies focusing on diverse and non-clinical scenarios.We compare different techniques to give a clear summary of the state of the field.We describe potential applications of iPPG in both clinical and non-clinical sectors separately to show the value of the iPPG technique in real world applications.Finally, we present several issues and scenarios for future studies.

Basic Framework
Using video camera imaging, a series of image and signal processing techniques are required to extract vital signs from the image.Figure 2

Data Acquisition
First, data, i.e., the video of skin area of human body, is collected using an imaging sensor such as a digital camera, webcam, smartphone, Microsoft Kinect or unmanned aerial vehicle (UAV), as depicted in Figure 3.A dedicated light or ambient light can be used as the light source.The frame rate varies from as low as 10 to 60 fps.

ROI Detection
After collecting video, regions of interest (ROI) such as the face, forehead, chest and palm within the video frames are detected either manually or automatically.

Raw Signal Extraction
Then, the raw signals are extracted from the selected ROI by calculating the spatial average of pixel value in the ROI for each frame using Equation (1).This describing an intensity-based method that uses spatial averaging.The purpose of this is to average out camera noise contained in each single pixel, thus improving signal to noise ratio.
where i R (t), i G (t), i B (t) are three source signals from red, green and blue components, respectively, I(x,y,t) is the brightness pixel value at image location (x,y) at time t, and |ROI| is the size of the selected ROI.

Noise Artefact Removal
The raw signal may contain unwanted noise due to factors such as subject movement, illumination changes, camera movement and skin tone.To remove unwanted noise from the raw signal various signal processing techniques may be applied such as low pass filtering, bandpass filtering, adaptive bandpass filtering, signal decomposition, blind source separation (BSS) and model-based methods.

Vital Sign Extraction
Finally, vital signs are extracted by using frequency analysis or peak detection.For frequency analysis, a signal that contains a distinct periodicity is converted to the frequency domain using a discrete Fourier transform.The fast Fourier transform (FFT) is generally applied to calculate the corresponding frequency, F s .Discrete cosine transform (DCT), Welch's method or short-time Fourier transform (STFT) can also be used.When using a peak detection algorithm, the number of peaks, N s , is calculated during the processing period T (s).Heart rate and respiratory rate per minute can be calculated as follows: HR or RR = 60 × F s HR or RR = 60 × (N s /T) (3)

Motion Based Methods
Cardiorespiratory activity causes subtle motion in various parts of the human body that can be measured from video to extract vital signs.Numerous researchers have introduced different techniques to monitor vital signs using information extracted from motion.Nakazima et al. [49] proposed a contactless method based on optical flow analysis to extract RR from the video of whole body motion captured by a CCD (charge-coupled device) camera.Another optical flow analysis-based respiratory monitoring system was introduced by Frigola et al. [50] using videos of a subject's chest movement recorded by video camera.Optical flow-based techniques have been found to be susceptible to motion artefacts caused by the subject's movement, ambient light and unclear ROI.Computational complexity is a consideration for optical flow-based analysis compared to other processing options.
Several studies have been published based on blind source separation (BSS) using either principal component analysis (PCA) or independent component analyses (ICA).Balakrishnan et al. [51] proposed a novel noncontact method based on head motion using facial video recorded by a digital camera to measure heart rate.In their proposed method, the Viola-Jones (V-J) face detector [52] and the Kanade-Lucas-Tomasi (KLT) [53] tracking algorithm were exploited for detecting the face, extracting the ROI and tracking the ROI feature points based on the good feature tracking (GFT) method.Then, using a Butterworth filter, the tracked points were temporally filtered and principal component analysis (PCA) was exploited to remove artefacts and measure the physiological signal.Finally, the pulse rate was extracted by means of a simple peak detection algorithm followed by a Fast Fourier Transform (FFT).On the other hand, to monitor HRs, Shan et al. [54] introduced another noncontact method using head motion captured by smart phone camera based on independent component analyses (ICA) rather than PCA.These methods are susceptible to motion artefacts as they considered only stationary subjects without any internal or external movement.To mitigate these limitations, an improved head motion-based method to extract HR using a webcam was designed by Irani et al. [55] based on the discrete cosine transform (DCT) and a moving average filter considering various facial expressions and head poses.Haque et al. [56] further improved this technique by integrating both the GFT method and supervised descent method (SDM) with training against the MANHOB-HCI (human computer interaction) database with moving objects.Lomazia et al. [57] proposed a method based on tracking both background and facial features to solve the handshaking problem with a smartphone camera.This method was further enhanced in [58] by exploiting a system with two cameras in a smartphone, where the front camera was for tracking facial features and the rear camera was for tracking background features.Some researchers have used video magnification to extract vital signs from motion.He et al. [59] introduced a contactless method combining both Eulerian video magnification (EVM) [60] and 2-Gaussian curve modelling for extracting pulse signals from a subject's wrist using a digital camera.A study by Al-Naji et al. [61] used video magnification and a frame subtraction method to remotely measure cardiac activity (heart rate, pulse width and the total cycle length) from head motion.Another study by Al-Naji et al. [62] presented a noncontact respiratory monitoring scheme for detecting and measuring respiratory rates as well as respiratory cycle timing parameters by using a video camera from the movement of the chest or blanket draped over an infant subject in various sleeping postures based on the two processing techniques, as shown in Figure 4.The first technique was for magnifying the motion resulting from chest movement based on 8 level wavelet pyramid decomposition and 5th order temporal elliptical filtering.The second technique was to measure respiratory rates by using motion detection exploiting frame subtraction, local contrast enhancement, binarization, morphological and masked filtering, white area detection and logical matrix calculation where the mean distance between ones in the logical matrix was used to calculate RR.Although the system was an effective solution for the unclear ROI problem, it still has some limitations.They have only considered a single vital sign, i.e., RR and a single infant, which limits the real applicability of the system.In [63], a contactless real time monitoring system was introduced for measuring respiratory rates and detecting apnoea by exploiting a Microsoft Kinect v2 sensor, as presented in Figure 3c, from the movement of thorax and abdomen in different sleeping positions and various conditions of light including dark environments.They used an improved motion magnification scheme based on the Lanczos resampling method, wavelet pyramid decomposition, temporal band pass filtering and image denoising to magnify the input signal.Frame subtraction was used to calculate respiratory rate.Nevertheless, the system suffered from motion artefacts, short range, and a limited number of viable ROI.Furthermore, they only considered respiratory activity and apnoea, and did not consider other vital signs.To mitigate this limitation, an improved contactless system was presented to measure the HR and RR and to sense irregular cardiopulmonary functions such as bradycardia, tachycardia, bradypnea, tachypnoea, and central apnoea by using the signal from the thoracic-abdominal region based on image sequences taken by the Microsoft Kinect v2 sensor with processing that considered unclear ROI, various illumination conditions and different sleeping postures [64].The basic block diagram of the proposed system is presented in Figure 5.An efficient motion magnification technique (EMMS) [65] was used to magnify the input data to make the movement apparent.Then, to calculate HR, an intensity-based technique was used, including signal decomposition, blind source separation, spectral analysis, filtering and peak detection.To calculate RR, they used a frame subtraction technique, binarisation, filtering, white area detection and binary matrix calculation.If any irregularity was detected, an alarm would be sent to a carer to notify them of the irregularity.However, the detection range was limited to 4.5 m and the number of ROI was also limited.Furthermore, system failure may occur in the case of a fully covered subject or if the subject should lie on their stomach.The motion-based method was beneficial in this case, since the method is robust to illumination variance, skin tone and unclear ROI.However, the main limitation of this method is the dependency on motion features to extract the physiological signals.As a result, the technique may be highly vulnerable to voluntary motion variations such as different facial expressions, walking and talking by the subject.Consequently, determining vital signs during voluntary motion may reduce the consistency and reliability of the method, and remains a research problem to be overcome.

Colour-Based Method
Photoplethysmography (PPG) is a non-invasive, low-cost, passive, optical technique first proposed by Hertzman et. al. in 1937 [66].It can monitor three vital signs: HR, RR and SpO 2 .PPG is used extensively in biomedical, clinical and non-clinical fields due to its simple design and relatively low cost.PPG measures variations in the optical properties of transmitted or reflected light from the human skin caused by blood volume changes during cardiorespiratory activity.Blood absorbs light more than the adjacent tissue which causes subtle optical property variations of the skin because of the haemoglobin in blood.Generally, PPG can function either in a transmissive or a reflective mode; however, the first mode is limited to regions such as the ear lobes and fingertips.In the PPG technique, a dedicated light source is required to illuminate a part of the body and an optical sensor or photodetector connected to the skin is needed to sense the optical properties of the skin.Imaging photoplethysmography (iPPG) is basically remote or non-contact PPG; whereas for skin-contact, a PPG sensor and illuminator replace the camera.To detect PPG signals remotely, many researchers have considered a video camera as an optical sensor.Some studies have used a dedicated light source and others have considered just ambient light as a source of light.Nevertheless, a dedicated light source increases the hardware setup of experiments as well as starts to become invasive to the participant, particularly infants.
As shown in Figure 6, the iPPG reflection model mainly consists of three parts such as a light source, a patch of human skin containing pulsating blood and a video camera.The light source can be a dedicated light source or ambient light.When a light source illuminates human skin, subtle colour changes are observed from the videos captured by the camera.Without illumination variations and motion artefacts, colour changes denote the blood volume changes in the microvascular tissue bed under the skin because pulsatile blood flows vary each cardiac cycle.Nevertheless, illumination variations and motion artefacts can also be a reason for the intensity and spectral composition variations.So, the skin area observed by the camera has a colour variation because of motion-induced and illumination-induced intensity/specular changes and pulse-induced subtle colour changes.It is assumed that the spectral composition of the light is fixed.The variation of light intensities is dependent on the distance from light source to camera and to the skin, and the specific geometry of the situation.Based on the dichromatic model [67], the light reflected from the skin can be presented as follows where P k (t) represents the RGB channels of the kth skin pixel; I(t) denotes the illumination intensity level; R s (t) signifies the specular reflection and R d (t) represents the diffuse reflection.I(t) is modulated by both specular and diffuse reflection.R n (t) signifies the quantization noise of the camera sensor.
The specular reflection of light from the skin surface is like light reflected from a mirror.As specularly reflected light does not penetrate skin tissues, it does not carry any information about the physiological signal.Additionally, the spectral composition of specular reflected light is identical to the light source [67].On the other hand, diffuse reflected light penetrates the skin, it is absorbed and scattered inside skin tissues and then reflected.Useful information about the physiological signal can be extracted from the diffuse reflected light.
Even though the slight colour variations of the skin are invisible to human eyes, they can be detected by video cameras.From the frame sequences of the video captured by a camera, the PPG signal can be extracted by image and signal processing techniques.
Over the past several years, numerous techniques in iPPG have been proposed under well controlled situations such as considering stationary subjects and stable illumination, as listed in Table 1.Verkruysse et al. [68] used normal ambient light as the light source to measure the HR and RR from the human face using a digital camera.In their proposed method, the ROI was selected manually, and the raw PPG signal was calculated per frame by an intensity method: averaging the spatial pixel value of the red, green, blue (RGB) colour channels.The fast Fourier transform (FFT) algorithm was used to calculate the power spectral density of the signal to extract HR.They showed that under these circumstances, the green channel with highest signal to noise ratio (SNR) contains the strongest plethysmography signal because oxygenated haemoglobin absorbs green light more than blue and red.In addition to HR and RR monitoring, they showed that PPG imaging can be used to characterise regions of high and low pulsatility on facial port wine stains (PWS).They used amplitude and phase maps to show the difference between normal skin and PWS skin.In normal skin, HR pulse amplitudes (G channel) were typically 2 to 4 times higher than in adjacent PWS skin, e.g., 0.75 and 0.25 PV, respectively.However, this method is strongly affected by motion artefacts caused by the subject's movement, as they did not consider methods for noise removal.Their ROI selection was manual.
To overcome the limitations of the previous technique, Pho et al. [69] proposed a novel method to measure HR, considering blind source separation (BSS) using a webcam and considering three subjects at a time.Moreover, they used the OpenCV face detection algorithm which is automatic and based on the Viola and Jones (V-J) [52] method.The facial ROI was defined as a rectangular bounding box.However, the proposed approach only considered very small movements and did not consider illumination variations.Additionally, Poh et al. considered the second component produced by ICA as the PPG signal always.Moreover, they only measured HR in that study, a drawback that was minimised in [70] where they presented an enhanced method to measure RR and heart rate variability (HRV) along with HR.Another ICA based method was introduced by Purche et al. [71] where they used two algorithms for extracting the pulse rate-a peak detection algorithm and power-spectrum analysis algorithm under both relaxed and active conditions-and found that power-spectrum performed better than peak detection.However, this method is highly susceptible to noise artefacts caused by illumination variations, subject movement and facial expressions.Lewandoska et al. [72] proposed another method based on BSS using principle component analysis (PCA) and reported that PCA is faster than ICA in terms of computation time and it can be a good alternative if only HR is extracted.However, in this method, only the stationary case and HR were considered.

Motion Artefacts
There are some studies that treat motion artefacts as a serious issue that have proposed various methods to suppress the motion artefacts based on blind source separation (BSS) or model-based methods, listed in Table 2.
Pho et al. [69] proposed an automatic and robust method to measure HR using a webcam considering three subjects at a time.They first introduced a novel method to remove motion artefacts using BSS based on independent component analysis (ICA).In their proposed method, first, the region of interest (ROI) is automatically detected using a face tracker.They used the OpenCV face detection algorithm which is based on the Viola and Jones (V-J) [52] method.The facial ROI is defined as a rectangular bounding box.Then, the ROI is decomposed into the RGB channels and spatially averaged to obtain the raw RGB traces.After that, ICA is applied on the normalised RGB traces to recover three independent source signals.
To show how ICA [73] works as BSS, let us consider that source signals, s i (t) = [s 1 (t), s 2 (t), . . ..., s k (t)], are independently transmitted by K sources.The observed signals, x i (t) = [x 1 (t), x 2 (t), . . ..., x k (t)], by various sensors, i = 1, 2, . . ., K can be written as follows: where mixing matrix A with column vectors a i is unknown and n i (t) is the additive noise.The observations, x(t), are mixtures or linear combinations of the source signals, s i (t).To estimate A and s i (t), it is assumed the all source signals are statistically independent and nongaussian.To reconstruct the source signals, u i (t), the following expression can be used: where W is the demixing matrix and it is the inverse of matrix A.
The joint approximate diagonalization of eigenmatrices (JADE) algorithm is then used to perform ICA.From three recovered independent source signals, the second component is always chosen as the desired source signal.Finally, FFT is used on the selected source signal to obtain the power spectrum.The pulse frequency is designated as the frequency that corresponds to the highest power in the spectrum within an operational frequency band.To conduct their experiment, Pho et al. considered 12 participants remaining still and engaging in natural movement.Using Bland-Altman and correlation analysis, they compared the HR extracted from videos recorded by a basic webcam to an FDA-approved finger blood volume pulse (BVP) sensor and achieved high accuracy and correlation even in the presence of movement artefacts.They also measured the HR of multiple persons.The study considered small movements and did not consider illumination variations.Additionally, they considered the second component produced by ICA to be the PPG signal in every case.
In [74], a motion compensated method was designed based on single-channel independent component analysis (SCICA) to extract both HR and RR using a CMOS video camera during exercise.Moreover, they implemented image stabilisation via 2D cross correlation and used time-frequency representation (TFR) for spectral analysis.Still, the motion artefacts could not be totally removed using this method.Moreover, in the extracted components, physiological waveforms were different under different situations.Feng et al. [75] introduced an improved motion tolerant technique to measure both average and instantaneous HR via webcam, considering a substantial amount of movement by the body as well as the head.After detecting a face by the V-J method, they used the speeded up robust feature (SURF) detection algorithm to find trackable interest points in the facial region and exploited the Kanade-Lucas-Tomasi (KLT) algorithm for tracking the faces.To extract the signal, they used an adaptive bandpass filter and automatic sorting algorithm to sort ICA output components by taking a sine function as a reference signal.Nevertheless, this method is not suitable if the subject walks or engages in other difficult movements and the range tested was also very short, with limited ROI options.Using a digital camera to remotely measure heart rate, Qi et al. [76] proposed a new technique to improve the photoplethysmography signal by combining facial sub-region landmark localisation and joint blind source separation (JBSS) methods to extract the physiological signal.
However, blind source separation-based approaches may have limitations when exposed to other incidental periodic signals, including illuminance or motion variations.To overcome this limitation, some researchers have considered different techniques to suppress motion artefacts rather than using BSS.For instance, a continuous wavelet transform-based method was presented in [77] to extract both instantaneous heart and respiratory rates using a webcam with normal head movements.However, their motion consideration was limited to head movements and they did not consider illumination variation or other movements.De Haan et al. [78] proposed a robust technique to extract HR from CCD camera video based on chrominance (CHROM) during exercise.In chrominance-based methods, the colour difference signals are combined linearly considering a standardized skin tone.However, in this case they considered two fitness devices, a stationary bike and a stepping device, with only one subject.Their method was also affected by skin colour, especially dark skin.This method was further improved in [79] by using the blood volume pulse vector and five different exercise devices.
Here, blood volume pulse (BVP) was considered as a signature for various reflection spectra of skin for explicitly distinguishing the physiological signal from noise caused by motion.Another study by Feng et al. [80] presented a system, robust to motion, using adaptive colour variation between red and green channels as well as an adaptive bandpass filter (ABF) to measure HR from moving subjects, exploiting a webcam.Using a CCD camera, Wang et al. [81] introduced a new robust technique exploiting spatial subspace rotation (2SR), where a spatial subspace of skin-pixels was estimated and its temporal rotation was measured in the image domain to measure HR.Nevertheless, as 2SR is a completely data-driven algorithm, it might cause undetected inaccurate results because of noise or a poorly selected skin-mask.Another algorithm to measure HR by exploiting imaging of the plane orthogonal to skin (POS) was presented in [67] where normalized RGB channels were combined into two new channels which were merged by weighting to the desired signal.Later, to allow the independent reduction of various motion frequencies by exploiting sub band (SB) decomposition, Wang et al. [82] enhanced the POS method to measure continuous HR considering different fitness applications.However, this technique cannot suppress motion artefacts when the motion and the pulse have the same frequency.Moreover, this method is susceptible to degradation with skin tone and illumination variation.Wu et al. introduced a time-frequency analysis method using continuous wavelet transform (CWT) [83] and a motion resistant spectral peak tracking (MRSPT) method [84] to measure HR via webcam, considering seven motion circumstances when subjects were driving, running and engaged in fitness training.Nevertheless, these methods considered very short ranges of around 1.5 m.For monitoring precise HR using a video camera, Xie et al.
[85] presented another method using singular spectrum analysis (SSA) where the motion signal measured from signal and SSA based on singular value decomposition (SVD) was employed to rectify motion artefacts during treadmill exercise.However, they only considered treadmill exercise and a small number of subjects.Moreover, only the first three leading decomposed components of the motion signal were removed from BVP, which may affect the performance of the proposed method.McDuff et al. [86] designed a computationally efficient method based on linear transformation exploiting parameters attained from tissue-like models of the skin to extract HR and HRV accurately in the presence of various head motions using a colour camera.However, the apparatus needed to be calibrated via a known colour grid and only systematic head motions were considered.Fallet et al. [87] introduced a signal quality index (SQI) and verified the capability of SQI as a tool in order to enhance the consistency of heart rate measuring applications considering videos of a moving object.However, this SQI on videos needs to be tested with participants carrying out smoother movements, in time-varying illumination environments.
Motion artefacts are not only caused by camera or subject movement but also caused by cardiac-related, i.e., ballistocardiographic (BCG) artefacts.Using a CCD camera, Moco et al. [88] proposed a motion robust PPG-imaging through colour channel-mapping based on chrominance (CHROM) and the blood-volume pulse (BVP) signature methods to overcome BCG artefacts.They extracted a reference PPG signal from the palm and obtained PPG-images as the normalised inner-product between this reference and the streams from the skin sensor array.In the proposed method, first they pre-process the videos.After frame preprocessing, remote-PPG signals are acquired in each sensor-element, mapped according to CHROM or PBV algorithms and correlated with a reference PPG signal from the palm.PPG amplitude and phase images are obtained using channel mapping algorithms.The result showed that the proposed method reduced BCG artefacts to less than 10% of the reference PPG signal strength at the palm.

Illumination Variations
While most studies are concerned about the motion artefacts, other studies have considered how to mitigate illumination variations, as listed in Table 3.For example, Chen et al. [89] proposed a new robust method to measure the pulse signal using a camera that is based on reflectance decomposition on the green channel and ensemble empirical mode decomposition (EEMD) to suppress the noise caused by illumination variation.First, videos covering the brow area were captured using a digital camera.Then, the green channel was selected for reflectance decomposition as oxygenated blood absorbs green light more than red and blue.Reflectance decomposition was done using the alternating direction method of multipliers (ADMMs).To remove noise caused by illumination variations, EEMD was used to decompose the original time signal from face reflectance to a set of intrinsic mode functions (IMFs).They selected IMF4 as it was closest to normal heart rate frequency.Finally, a peak detection algorithm was applied to detect the number of peaks and, finally, HR was calculated.The experimental result showed that their method outperformed Pho's method [69], providing better measurement accuracy with a smaller variance.This method was further improved in [90] where HR was evaluated by means of a multiple linear regression (MLR) model followed by Poisson distribution to reduce the effects of ambient light changes.Nevertheless, EEMD has limitations of inadvertently considering periodic illuminant variations as physiological signals, especially if the frequency is near the normal cardiac frequency range, particularly from 0.75 to 4 Hz.Moreover, both methods are not suitable for real time application.To measure HR, Lee et al. [91] used a different approach based on multi order curve fitting (MOCF) to remove noise artefacts, considering subjects watching television in a dark room.In the proposed approach, they subtracted the estimated brightness signal from the raw PPG signal to cancel out the noise caused by illumination variations.Another study by Tarassenko et al. [92] presented a new technique based on autoregressive (AR) modelling and pole cancellation to cancel out the aliased frequency components caused by artificial light flicker, which enhanced robustness of measurement under strong fluorescent lights.Nevertheless, the periodic illumination variations may affect the performance as AR modelling is a spectral analysis technique.Moreover, there is a calibration issue while calculating oxygen saturation as well.To extract HR using a webcam, Cheng et al. [93] designed a robust method exploiting illumination variation by combining joint blind source separation (JBSS) and EEMD and considering background images as well.However, this method is not free from motion artefacts as they only considered stationary subjects.Moreover, the illumination variation considered here was mostly controlled, which restricts real time applicability of the proposed method.Another method that was robust to illumination variations was proposed by Xu et al. [94], using partial least squares (PLS) and multivariate empirical mode decomposition (MEMD) to efficiently extract HR using a webcam under varying illumination conditions.However, they only considered artificial illumination created by an LED lamp.
Most of the above-mentioned works are concerned about the elimination of the effect of illumination variations and keeping the participants still, which means that they did not consider motion artefacts.Nevertheless, both illumination variations and motion artefacts are important in realistic applications.So, these approaches which are initially considered to remove illumination variations should be further improved by considering motion artefacts.

Both Illumination Variations and Motion Artefacts
Most studies are concerned with either motion artefacts or illumination variations.Only a few studies have considered both illumination variations and motion artefacts as issues.Table 4 summarizes the studies concerned with both illumination and motion.Li et al. [95] introduced a normalized least mean square (NLMS) adaptive filtering scheme to suppress the effect of both the subject's motion and illumination variations considering realistic situations, for example, watching movies or playing games by using an iSight camera of an IPAD to measure cardiac pulse.Nevertheless, this method was seriously affected by large head movements.Using a monochrome camera, Kumar et al. [96] designed a robust method to extract HR and HRV based on a weighted average and considering different skin tones, illumination variation, and several motion scenarios.Al-Naji et al. [97] presented a new noise elimination technique using both complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and canonical correlation analysis (CCA) to eliminate noise artefacts caused by movement of subjects and camera, variations of illumination and variation of skin tone.As shown in Figure 7, firstly, data was collected using a UAV from a 3 m distance at various times of day with various illumination levels.Then, an improved video magnification technique [98] was used to magnify skin colour variation.After that, an efficient face detection method [99] was used to detect facial ROI as it is more effective with an inclined or angled face.Raw iPPG signal was extracted using spatial averaging within the facial ROI of the green channel.To suppress the noise caused by illumination variations, CEEMDAN was applied as it showed better performance than EMD and EEMD by decreasing noise from the intrinsic mode functions (IMFs) with more physical meaning.BY using CEEMDAN, the iPPG signal was decomposed into eight IMFs and the 5th, 6th and 7th IMFs were selected as their frequency bands fall within 0.2-4 Hz, corresponding to 12 to 240 beats/min.Then, to reduce motion artefacts, CCA was used on chosen IMFs as CCA generates components derived from their uncorrelated signals rather than independence components used in ICA and gives better performance than ICA.To explain how CCA works as BSS [100], let us consider j and k to be two multidimensional random signals with N mixtures.The linear combinations of these signals are known as the canonical variates and can be written as follows: where the weighting vectors of j and k are , respectively, which maximise the correlation between j and k by resolving the following maximisation problem: where the non-singular within-set covariance matrices of j and k are C jj and C kk , respectively; C jk is the between-sets covariance matrix; and E represents the expected value operator of the corresponding variables.The maximisation problem with respect to W j and W k can be resolved as follows: After solving Equation ( 10), a complete explanation of the canonical correlations can be written as follows: The K approximations of the source signals, z i (t), i = 1,2, . . ., K, can be obtained by: It is noted from Equation ( 12) that the CCA technique yields the same outcome when employed with a given dataset; which is not possible in case of the ICA technique.
Furthermore, spectral analysis and filtering were done using FFT and two Butterworth bandpass filters, respectively.Finally, HR and RR were extracted using a peak detection algorithm.To obtain experimental results, the authors considered 15 subjects with different skin tones in four different scenarios such as without movement, different facial expressions, talking and different illumination levels.Figure 8 shows that the proposed method (with and without magnification) achieved better performance than ICA and PCA for all four scenarios.However, the CCA (1.22 s) needs higher computational time than ICA (0.86 s) and PCA (0.79 s).Moreover, in the proposed method, they did not consider higher levels of movement such as walking or exercising.In addition, this method is also constrained to limited distance and single subject detection.

Alternate Sensors
Rather than using digital cameras or webcams, some researchers have used other sensors to capture PPG data, as shown in Table 5.For example, Kwon et al. [101] extracted HR by exploiting the built-in camera of a smartphone and introduced FaceBEAT, which is an application in an iPhone for measuring HR remotely.Al-Naji et al. [102] presented a robust method to monitor cardiorespiratory signals remotely from the video taken by a hovering unmanned aerial vehicle (UAV), as shown in Figure 9.They used an improved video magnification technique, an intensity-based method and advanced signal processing methods such as signal decomposition based on complete EEMD and blind source separation based on ICA to eliminate noise artefacts.Nevertheless, only slow and small movements were considered, and they did not address low light situations either.To extract vital signs, McDuff et al. [103] used a DSLR camera with five bands (red, green, blue, cyan and orange), considering only stationary subjects both at rest and under cognitive stress and reported that CGO is better than RGB.However, this method is susceptible to noise artefacts and is not suitable for real time application.In [104], a comparison between a CMOS camera and a webcam was done to extract HR under cycling exercise, and reported HR values were independent of the measurement method.However, the proposed method is applicable to a very limited range of 0.2 to 0.35 m.Blanik et al. [36] combined both a CCD camera and a thermal camera to monitor a broad range of physiological signs.Using a Kinect device, Bernacchia et al. [105] measured HR and RR by means of spatial averaging and ICA.Smilkstein et al. [106] and Gambi et al. [107] used Microsoft Kinect to extract HR, exploiting the EVM technique based on RGB signals.

Different Subjects
Most of the work considered adult, healthy subjects and a limited number of babies to measure vital signs.However, there are some researchers who considered infants in neonatal intensive care units (NICUs), as described in Table 5. Scalise et al. [108] proposed a system to measure heart rates using a web camera in a NICU using an intensity-based method and ICA.However, this method is susceptible to noise artefacts such as motion and illumination variations and range is very short, at only 0.20 m.Another work [10] by Arts monitored the HR of newborn infants in NICUs using a digital camera with up to 1 m range, exploiting ambient light.Still, this method is not free from motion artefacts, illumination variation and the skin tone problems.To suppress illumination variations and moderate motion artefacts, Cobos-Torres et al. [109] designed a computationally efficient method based on numerical analysis techniques and filtering using a digital camera to measure the HR and RR of preterm infants in NICUs.Nevertheless, the proposed method is affected by strong motion, poor lighting and shadow.Gibson et al. [110] introduced a remote monitoring system to monitor the HR and RR of infants in NICUs using a digital camera based on video magnification and compared their results with ECG data.Their system was also able to detect a real apnoea event in clinical settings.

Different Vital Signs
Almost all studies aimed to extract heart and respiratory rates.Few researchers consider other vital signs such as heart rate variability and oxygen saturation, as shown in Table 5.To extract blood oxygen saturation (SpO 2 ) along with HR and RR, Tarassenko et al. [92] presented a new technique based on autoregressive (AR) modelling and pole cancellation in order to cancel out the aliased frequency components caused by artificial light flicker, which enhanced the robustness of the method under strong fluorescent lights.By means of a webcam, another robust method to calculate SpO 2 and average HR was introduced by Bal [111], using a noise removal algorithm based on dual tree complex wavelet transform (DTCWT) to rectify the artefacts caused by movement and artificial lighting.However, the range was very short, at only 50 cm.

Multiple ROIs
Almost all of the prior works consider limited ROI, particularly the face, for vital sign extraction.Some studies have used the face and cheeks [10,112], face and forehead [72], cheeks and forehead [92], nose, forehead and mouth [71], as ROIs to measure physiological signals.Another study by Datcu et al. [113] considered 10 different parts of the face as ROIs.To extract vital signs, Yu et al. [74] and Feng et al. [114] used both the face and the palm.Another study by Bernacchia et al. [105] considered the neck, thorax and abdominal area to calculate HR and RR.Zhao et al. [115] extracted physiological signals from face, arm and hand using a webcam, considering two subjects simultaneously under stationary conditions.However, this method is susceptible to motion artefacts and was only tested over short ranges.Al-Naji et al. [116] measured cardiopulmonary signals from various regions of the human body such as the face, palm, wrist, arm, neck, leg, forehead, head and chest under stationary scenarios using a digital camera.They proposed a noise elimination technique based on EEMD and ICA.They used various methods such as intensity, frame subtraction and feature tracking to extract cardiopulmonary signals by considering skin colour variation, chest motion and head motion.Nevertheless, the proposed methods suffer from various issues such as limited range, a single subject at a time and noise artefacts.

Long Distance and Multiple Persons Simultaneously
Most of the previous methods discussed above have some limitations.Firstly, most of the works considered very short ranges, the highest distance considered was only 3 m.Secondly, other than Pho et al. (three subjects) and Zhao et al. (two subjects), all other methods measured vital signs for one subject at a time.In [117], a robust contactless method was proposed to calculate HR and RR using both iPPG and head motion from the videos taken by both a digital camera and a hovering UAV, considering long ranges of up to 60 m, multiple subjects in groups of up to six people simultaneously under both stationary and non-stationary circumstances using a video magnification technique.To eliminate noise artefacts caused by movement of subjects and camera, variations of illumination and variation of skin tone, they used a noise elimination technique using both CEEMDAN and CCA.The FFT was used for spectral analysis and two Butterworth bandpass filters were used for filtering.Finally, HR and RR were extracted by calculating the number of peaks using the MATLAB built-in function 'findpeaks'.Moreover, they introduced a graphical user interface (GUI) that facilitates a user loading video data, selecting the magnification type, and executing the proposed system and configurations.
The experimental setup and data acquisition of the proposed system is presented in Figure 10 where they considered three groups of people consisting of 15, 20 and 10 subjects with different skin tones, respectively, under three assumption: noise artefacts, multiple detection and long distance.Figures 11 and 12 demonstrate that the proposed method achieved better performance than ICA and PCA for both stationary and nonstationary scenarios with multiple subject detection and long distance, respectively.From these figures, it can also be noted that colour-based methods gave better performance than motion-based methods.However, the system used limited ROI, observing only the face and will be affected by unclear ROI.

Others
Based on machine learning, Hsu et al. [118] presented a novel method to extract HR using a video camera where the PPG signal was recovered from either ICA or chrominance-based methods and may be enhanced by utilising the mid-level PPG based features exploiting support vector regression (SVR).However, they only considered the stationary case.Using a deep convolutional attention network (CAN), Chen et al. [119] introduced the first end-to-end system named DeepPhys for video-based measurement of HR and BR.They proposed a new motion representation based on a skin reflection model and a new attention mechanism using appearance information to guide motion estimation, both of which enabled robust measurement under varying lighting and significant head motions.Normalised frame difference was used as input motion representation.The network learnt spatial masks, that were shared between the models, and features important for recovering the BVP and respiration signals.The motion model and the appearance model were learnt jointly to find the best motion estimator and the best ROI detector simultaneously.They evaluated their method on three datasets of RGB videos and a dataset of IR videos.The proposed approach significantly outperformed prior state-of-the-art methods (CHROM, POS etc.) on both RGB and infrared video datasets and allowed spatio-temporal visualisation of physiological information in video.Moreover, the participant dependent vs. independent performance as well as the transfer learning results showed that the supervised method generalised to other people, skin types and illumination conditions.Yu et al. [120] proposed the first end-to-end spatiotemporal network (PhysNet) to recover iPPG signals precisely from raw facial videos, considering a temporal context which was not considered in previous works.They conducted comprehensive experiments on OBF and MAHNOB-HCI datasets.First, the face area was detected using the V-J face detector.Then, to extract the iPPG signal, a spatio-temporal network with 3D convolutional neural networks (3DCNN) and a recurrent neural network (RNN) was used.After that, filtering, normalisation, and peak detection were performed to attain the inter-beat-intervals. Finally, the average HR and HRV were calculated.Experimental results showed that the proposed PhysNet reconstructed iPPG signals with accurate time location of each individual pulse peak and achieved better performance on both HR and HRV levels compared to the state-of-the-art methods (CHROM, POS etc.).This method has potential applications in remote arterial fibrillation detection and emotion recognition.
To integrate colour-based and motion-based approaches, Weidi et al. [121] introduced two fusion techniques, the mean method and the ratio of variations method, to extract HR using an industrial camera.They reported that a colour-based method is more reliable than a motion-based method.However, the accuracy of the fusion-based method was not satisfactory, and this method was confined to rehabilitation only.

Application
As iPPG overcomes various limitations of contact sensors especially related to sensitive skin damage or infection, such contactless vital sign monitoring methods have promise to become realistic and attractive for numerous potential applications in both clinical and nonclinical scenarios.

Clinical Application
Because of its contactless sensing mode, iPPG can offer a comfortable method to monitor elderly people, infants, and patients with chronic pain, burnt skin, under dialysis, during and after surgery.

Neonatal Monitoring
Neonates and infants have very delicate and sensitive skin so non-contact methods would be the preferred approach to monitoring them.Several studies have used iPPG methods to monitor neonates.For example, Klaessens et al. [122] introduced a baby-friendly non-contact approach to monitor several vital signs such as HR, RR, skin temperature, and SpO 2 of infants and neonates in NICUs with gestational ages of 24 to 39 weeks.To measure HR and RR, RGB color magnification and IR-thermography were used, respectively.Another study by Villario et al. [123] applied the iPPG technique to continuously monitor HR, RR, and SpO 2 of neonates and infants in an NICU.They also detected bradycardia while monitoring participants continuously for long periods.Other studies [10,[108][109][110] have also proved the promise of iPPG in neonatal monitoring.

Assessing Patients with Chronic Pain
Rubins et al. [124] used imaging photoplethysmography to assess patients with chronic pain, particularly neuropathic pain, using a monochrome camera.Another study by Zaproudina et al. [125] also employed the iPPG method to find new biomarkers of migraine patients using a monochrome camera.

Critical Patient Monitoring
Rasche et al. [126] used the iPPG technique in patients after heart surgery using a mobile camera under critical care conditions.In their proposed method, ROI was selected manually, and optical flow techniques were used to detect motion artefacts.Moreover, high pass filter and Fourier spectra were employed to calculate HR.Nevertheless, their method is affected by patient movements and illumination variations.

Arrythmia Detection
An abnormal heart rhythm is called an arrhythmia.Amelard et al. [127] applied the iPPG technique to detect arrhythmia by using a monochrome camera based on a signal fusion technique.The signal fusion technique was used to extract the blood pulse signal using prior information.They formulated the problem based on a Bayesian classification and modeled a novel probabilistic pulsatility model which combined spectral and spatial priors derived from physiology of the blood pulse signal.4.1.5.Anesthesia Monitoring Rubins et al. [128] introduced the concept that a PPG imaging system might be employed to continuously monitor regional anesthesia by monitoring skin microcirculation.Using both an RGB camera and a near-infrared camera, Trumpp et al. [129] applied camera-based PPG to monitor patients of cardiovascular disease during surgery in an intraoperative environment.It can help anesthetists to take measurements in response to any cardiovascular events and to administer appropriate medication.[131] introduced that PPG imaging can be used to identify the correct depth of burn excision as there was significantly less blood circulation in burnt tissue.Consequently, it can help surgeons by giving an idea of where and how much to resect.Thus, iPPG helps to improve burn care as well.

Non-Clinical Application
In nonclinical sectors such as home health care, fitness monitoring, sleep monitoring, polygraph, living skin detection, stress monitoring, driver monitoring, security, war zone, natural calamity, and animal research, the iPPG may play an important future role by monitoring vital signs.

Home Health Care
There are limited places and resources for patients in hospitals.Moreover, staying in hospital for a long time is uncomfortable and expensive.Therefore, there is a need for patients, particularly elderly people and infants, to be monitored at home.As iPPG is a noncontact and low cost method with no consumable supplies, it can be used to monitor people's vital signs in home environments [3].Bernacchia et al. [105] also proposed a novel method using a Kinect device that could be applied to the monitoring of HR and RR of subjects at home, without the presence of experts or clinicians.

Fitness Monitoring
Exercise is good for health.However, over-exercise can have severe adverse health effects including death due to heart attack.Therefore, it can be important to monitor HR and RR to assess the health status of the exerciser and customise their training program according to their changing vital signs.Monitoring the physiological state using iPPG while doing exercise can be a convenient solution to overcome the adverse effects of over-exercising.In [74,78,79,82,85,93], iPPG was applied to monitor HR and RR during exercise, considering various fitness devices such as a stepping device, a treadmill, a bike, a hand bike, and a synchro-device.

Sleep Monitoring
Continuous monitoring of vital signs, even at night, can be carried out by combining visible RGB cameras and infrared cameras.During the night, for sleep monitoring, the iPPG exploiting infrared cameras will be particularly suitable [3].HR and RR of sleep apnea patients can also be monitored using the Microsoft Kinect sensor [106] and colour magnification.Using three monochrome cameras, Vogels et al. [132] proposed a fully automated method to remotely and continuously monitor HR and SpO 2 during sleep.In the proposed method, videos are first pre-processed using Gaussian smoothing using a 2D Gaussian kernel and rigid block segmentation.Then, the pulse signal, extracted using the PBV [79] method, is exploited as a feature to distinguish living and non-living tissue based on similarity mapping.After that, a hybrid method is introduced which combines the iPPG-based subject detection with a tracker to select an ROI.Finally, vital signs are extracted.They used five healthy subjects sleeping in different supine positions, to simulate realistic sleep scenarios.The results showed that the proposed method outperformed the state-of-the-art method (VPS, V-J etc.) for the estimation of oxygen saturation.

Polygraph
Telling a lie activates the autonomic nervous system (ANS).Therefore, there is a significant difference of mental stress which results in variations in physiological signs depending on whether someone is telling a lie or the truth.When someone is interrogated, the iPPG has been proposed as a polygraph for lie detection [2].

Living Skin Detection
Recently, researchers have been using the iPPG technique to detect living skin.For example, Wang et al. [133] proposed a novel unsupervised technique called "Voxel-Pulse-Spectral"(VPS) to detect living skin tissue using a CCD camera considering the pulse as a feature.However, the VPS method is time consuming because of the complexity of unsupervised learning for pulse extraction.A fast living skin detection method was introduced in another study [134] where the time variant iPPG signal is transformed into signal shape descriptors using a technique known as multiresolution iterative spectra.

Security
In security systems, the biometric information of individuals is mostly face biometrics and is used for authentication.However, such biometric information can be stolen or duplicated by attackers, which is called biometric presentation attack (BPA).Lakshminarayana et al. [135] employed iPPG to differentiate authentic users based on deep Convolutional Neural Network (CNN) considering CASIA and Replay-Attack.Nowara et al. [136] and Seepers et al. [137] also reported that the iPPG technique can be exploited in biometric authentication.

Emotion/Stress Monitoring
Stress is one of the major issues for various diseases such as cardiovascular diseases, diabetes, and cerebrovascular diseases.To understand and control personal stress, it is necessary to monitor an individual's emotional state or stress level continuously.Physiological signals are more reliable than other factors such as facial expression, gesture or voice to measure emotional state.Several researchers have used iPPG to monitor stress.For instance, Maaoui et al. [138] detected and classified emotional states exploiting HR extracted from videos captured by webcam and using machine learning algorithms.Another study by Madan et al. [139] also used a webcam to detect emotional state by measuring changes in HR during sitting and standing conditions.By using a five band digital camera, McDuff et al. [140] measured cognitive stress by measuring HRV, where subjects were requested to do a mental arithmetic task.Burzo et al. [141], Monkaresi et al. [142] and Rouast et al. [143] have shown that iPPG can be effectively used to monitor emotional states of human beings.

Driver Monitoring
With the increase in the number of vehicles on the road, there is a potential increase in the number of accidents as well.The primary reasons for traffic accidents are physiological and psychological factors such as illness, mental illness, fatigue and the external ambient environment.Therefore, iPPG can be used to measure the physiological and psychological states of drivers.Kuo et al. [144] measured a driver's HR using video cameras, considering the in-vehicle environment.Another study by Zhang et al. [145] used a webcam to monitor HR based on ICA during various driving situations.Other studies [146,147] have also applied the iPPG method to monitor the physiological parameters of drivers.4.2.9.War Zone or Natural Calamity Al-Naji et al. [117] have demonstrated that a hovering UAV can be used to capture video of subjects to extract HR and RR, considering multiple persons together at significant distance.This method has opened a new area for the potential application of iPPG during war or natural calamities such as earthquakes.

Animal Research
In animal research, camera-based techniques have numerous potential applications such as monitoring vital signs, assessment of motion activity and wound infection [148].In [3], they extracted the HR and RR of mice, zebrafish and pigs using a CMOS camera and exploiting ICA based on the iPPG technique.Blanik et al. [149] used the iPPG technique to extract HR and RR of anesthetized pigs with Acute Respiratory Distress Syndrome (ARDS) using a CCD camera.By means of iPPG, Unakafov et al. [150] accurately estimated the HR of rhesus monkeys from facial videos captured by Microsoft Kinect and a monochrome near-infrared (NIR) camera.

Research Gaps and Challenges and Future Direction
During the past few years, many advances have been made using camera imaging-based methods and numerous studies have been published in this field.However, the existing studies have several limitations that need to be overcome in the future.

•
Most of the current studies focus on either motion artefacts or illumination variations.However, in real time scenarios both motion and illumination play an important role in degrading accuracy and usability.Only a few researchers have given solutions to both motion artefacts and illumination variations.Therefore, it can be a further topic of research for new researchers to develop a method which can deal with both motion and illumination artefacts.

•
Researchers are still mainly interested in extracting HR and RR.Nevertheless, blood oxygen also plays an important role in assessing a human's health condition.Blood glucose [151][152][153] is another important vital sign which helps to identify and maintain the welfare of diabetes patients.Therefore, more research needs to be done to monitor blood oxygen saturation and blood glucose.The number of ROI and selecting ROI are important issues to overcome as very few researchers have paid attention to these problems well enough for industrial or commercial use.Most of the studies have used manual processes while some have used automatic methods to select from a very limited number of ROI in limited scenarios.However, advanced techniques are required to automatically select multiple ROIs.

•
Researchers mainly considered healthy young participants as their subjects to do the experiments and did not use patients, elderly people and premature babies.More research needs to be done considering different subjects such as premature babies and elderly people.

•
Most of the existing works considered privately owned databases.Only a few used publicly available databases such as the MAHNOB-HCI (human computer interaction) or DEAP (Database for Emotion Analysis using Physiological Signals).Lack of publicly available datasets taken under a realistic situation is another challenge to deal with in the future.

•
To validate proposed methods, researchers have mainly used a pulse oximeter as a ground truth.The ECG was used by very few researchers as a ground truth.There are indications that no commercial instrumentation is truly accurate and most are simply accepted as accurate [110].

•
Future research could also include multi-camera fusion as well as new non-visible light sensors to tackle the visible light camera's shortcomings.

Conclusions
Computer vision-based methods for vital signs measurement have been attracting increasing attention over the past few years.This paper gives a review of recent works in camera imaging-based methods, especially colour-based methods.First, numerous works based on motion-and colour-based methods have been discussed.Then, different aspects of colour-based methods, for example, motion artefacts, illumination variations, alternate sensors, different subjects, different vital signs, multiple ROIs, long distance and multiple persons have been reviewed.Additionally, potential applications of iPPG in both clinical and non-clinical sectors have been described.Moreover, we tried to identify the research gaps and challenges of the existing studies and gave some indications for future research.We believe that this paper will be a pathway for new researchers to understand and explore the gaps and challenges found in recent studies.
shows the basic framework, which includes data acquisition, ROI detection, raw signal extraction, noise artefact removal and vital sign extraction.

Figure 2 .
Figure 2. Block diagram of contactless monitoring system.

Figure 4 .
Figure 4. System diagram of the remote respiratory monitoring system [62].

Figure 5 .
Figure 5. Block diagram of the contactless vital sign monitoring system using a Microsoft Kinect sensor [64].

Figure 7 .
Figure 7. System overview of the noise removal technique using a UAV [97].

Figure 8 .
Figure 8. RMSE performance comparison of complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and canonical correlation analysis (CCA) techniques for four scenarios [97].(a) RMSE performance comparison for HR, (b) RMSE performance comparison for RR.

Figure 10 .
Figure 10.The experimental setup and data acquisition of the noise artefact removal, multiple person and long detection system [117].

Figure 11 .
Figure 11.RMSE performance comparison of CEEMDAN and CCA techniques under multiple detection [117].(a) RMSE performance comparison for HR, (b) RMSE performance comparison for RR.

Figure 12 .
Figure 12.RMSE performance comparison of CEEMDAN and CCA techniques at long range [117].(a) RMSE performance comparison for HR, (b) RMSE performance comparison for RR.

•
All of the studies except Pho et al. (3 subjects), Zhao et al. (2 subjects) and Al-Naji et al. (6 subjects), considered only a single subject at a time when monitoring vital signs.Detecting multiple persons simultaneously is a major issue in the current studies that need to be overcome in the future.• The distance between the camera and a subject is another challenge for current researchers.All the studies except Al-Naji et al. (60 m) considered very small distances while monitoring vital signs.•

Table 2 .
iPPG based methods concern about motion artefacts.Pearson correlation coefficient, RMSE = root mean square error, SNR = signal to noise ratio, MAE = mean absolute error, bpm = beats per minute.

Table 3 .
iPPG based methods concern about illumination variations.

Table 4 .
iPPG based methods concern about both illumination variations and motion artefacts.

Table 5 .
Summary of different iPPG based methods.
4.1.6.Monitoring Dialysis Patients Villarroel et al. [130] extracted HR, RR and SpO 2 of patients going through haemodialysis treatment in the Renal Unit of the Churchill Hospital in Oxford, UK, based on iPPG using a high-quality 5 megapixel camera.Tarassenko et al. [92] also applied iPPG to monitor HR, RR, and SpO 2 of hemodialysis patients in the Oxford Kidney Unit.