Recognition of Human Face Regions under Adverse Conditions—Face Masks and Glasses—In Thermographic Sanitary Barriers through Learning Transfer from an Object Detector †

: The COVID-19 pandemic has detrimentally affected people’s lives and the economies of many countries, causing disruption in the health, education, transport, and other sectors. Several countries have implemented sanitary barriers at airports, bus and train stations, company gates, and other shared spaces to detect patients with viral symptoms in an effort to contain the spread of the disease. As fever is one of the most recurrent disease symptoms, the demand for devices that measure skin (body surface) temperature has increased. The thermal imaging camera, also known as a thermal imager, is one such device used to measure temperature. It employs a technology known as infrared thermography and is a noninvasive, fast, and objective tool. This study employed machine learning transfer using You Only Look Once (YOLO) to detect the hottest temperatures in the regions of interest (ROIs) of the human face in thermographic images, allowing the identiﬁcation of a febrile state in humans. The algorithms detect areas of interest in the thermographic images, such as the eyes, forehead, and ears, before analyzing the temperatures in these regions. The developed software achieved excellent performance in detecting the established areas of interest, adequately indicating the maximum temperature within each region of interest, and correctly choosing the maximum temperature among them.


Introduction
Coronaviruses (CoVs) are viruses that cause respiratory infections in animals such as birds and mammals, including humans. There have been seven recorded CoVs that have caused serious harm to human health, with two of them responsible for the epidemics that emerged in Hong Kong in 2003 and Saudi Arabia in 2012 [1]. In December 2019, a new CoV called SARS-CoV-2 (the virus that causes the disease COVID-19) emerged in the city of Wuhan, China. In the first part of 2020, this virus spread to virtually every country in the the highest temperatures in the frontal and lateral regions of the head. The algorithm can then analyze the temperature of the ROI and, subsequently, estimate body surface temperature more accurately and efficiently than manual measurement methods. Additionally, the algorithm can incorporate suitable diagnostic criteria for the different ROIs with different febrility thresholds.
This article is an expanded article from a conference paper presented at the 14th IEEE/IAS International Conference on Industry Applications (Induscon) [16], whose theme was 'Innovation in the time of COVID-19 . This version introduces more details on human infrared thermography, presents more tests with volunteers, and applies Optical Character Recognition (OCR) technology to identify maximum and minimum temperatures in thermographs. These add-ons improved the work previously carried out in the automatic detection of febrile people at sanitary barriers, which is very relevant in this phase of the COVID-19 pandemic, where new variants of the virus are emerging.

Materials and Methods
This section presents the main steps in developing an automatic system for measuring the human temperature at sanitary barriers by combining thermography and computer vision technologies.
The The volunteers who participated in this research were informed about the objectives, the scope of their participation, the confidential treatment of their data, and the consolidated statistically grouped method of disclosing data. All participants provided written consent.
The inclusion criterion considered the volunteers 18 years old or older and the signature on the consent term of free participation without any burden or bonus for the volunteer or researchers, with the possibility of withdrawing from the study at any time.

Fever and Human Thermography
Fever occurs when there is an increase in the body's thermal threshold, usually maintained at around 37 • C, triggering metabolic responses of heat production and conservation, for example, tremors and peripheral vasoconstriction. These responses help to raise the body temperature to the new threshold. After fever is resolved or treated, threshold returns to baseline and heat loss processes begin, e.g., peripheral vasodilation and sweating [17].
However, the surface temperature of the human body is different from the core temperature, which is the gold standard for diagnosing fever. The surface temperature presents different and typically lower values from the core temperature for the different regions of the face, a surface commonly inspected in sanitary barriers. Different face regions can vary the nonfever temperature from 32.3 • C up to 35.9 • C [24][25][26]. Therefore, properly identifying the region of interest (ROI) on the human face where the temperature is being measured and applying an adequate threshold leads to a more accurate diagnosis than measuring the maximum face temperature without considering which region of the face is being treated.

Infrared Thermography
In physics, waves are periodic disturbances that maintain their shape as they propagate through space as a function of time. The literature describes visible light, ultraviolet Machines 2022, 10, 43 4 of 16 radiation (UV), and infrared radiation (IR) specifically as types of electromagnetic (EM) waves. The spatial periodicity, or the interval between two wave peaks, is called the wavelength, λ, and is given in meters, nanometers, or micrometers. The temporal periodicity, or the time interval between two wave peaks, is denoted as the oscillation period, T, and is given in seconds or its submultiples. The frequency, v, is the inverse of period T, with the unit as Hertz. Figure 1 presents an overview of the most common characteristics of EM waves. Visible light, defined by the range over which the light receptors of human eyes can detect, covers a small range within the spectrum, with wavelengths ranging from 380 nm to 780 nm [11].

Infrared Thermography
In physics, waves are periodic disturbances that maintain their shape as they propagate through space as a function of time. The literature describes visible light, ultraviolet radiation (UV), and infrared radiation (IR) specifically as types of electromagnetic (EM) waves. The spatial periodicity, or the interval between two wave peaks, is called the wavelength, λ, and is given in meters, nanometers, or micrometers. The temporal periodicity, or the time interval between two wave peaks, is denoted as the oscillation period, T, and is given in seconds or its submultiples. The frequency, v, is the inverse of period T, with the unit as Hertz. Figure 1 presents an overview of the most common characteristics of EM waves. Visible light, defined by the range over which the light receptors of human eyes can detect, covers a small range within the spectrum, with wavelengths ranging from 380 nm to 780 nm [11].

Figure 1.
Overview of the electromagnetic wave spectrum. Adapted from [11].
The spectral region with wavelengths in the range of 0.7-1000 µ m is generally called the infrared region, which is the focus of this work [11]. Infrared radiation is invisible to The spectral region with wavelengths in the range of 0.7-1000 µm is generally called the infrared region, which is the focus of this work [11]. Infrared radiation is invisible to the human eye and has a long wavelength and low energy [29]. Anybody with a temperature above absolute zero (0 K, −273.15 • C) emits infrared radiation perceived as heat. The amount of radiation emitted by a body depends on the temperature and properties of the material [11].
The infrared radiation spectrum bands generally applied in technologies involving thermographic images are MWIR and LWIR [29].
Infrared thermal imaging, also called infrared thermography (IRT), is a rapidly evolving technology. Currently, researchers are applying IRT to intelligent solutions in different fields, including condition monitoring, predictive maintenance, and gas detection. Medicine is another area that has benefited from this technology, employing IRT in oncology (breast, skin, etc.), surgery, medication effectiveness monitoring, and, more recently, for acute respiratory syndrome testing applications [30].
Technologies based on IRT can detect the intensity of thermal radiation emitted by objects since the bodies transmit, radiate, and reflect infrared radiation. Radiation transmission, or transmissivity, is the ability of a material to allow infrared radiation to pass through it. Emissivity is the capacity of a material to emit infrared radiation. Finally, reflectivity is the capability of the material/object surface to reflect radiation, that is, temperature reflected from the object.

Machine Learning
With the high volume of data generated by devices, sensors, and users, machines capable of identifying patterns and assisting in making decisions have become essential, with supervised learning and unsupervised machine learning being the most widely adopted methods. Reinforcement and semisupervised learning are other methods that may be used [31].
Deep learning is a set of machine learning technologies that utilize algorithms to detect, recognize, and classify objects and text in images or other documents. One of the leading deep learning architectures is the convolutional neural network (CNN), which is used to solve most image analysis problems [32].

Convolutional Neural Networks
CNNs have been widely applied in image classifiers. They excel in analyzing images and learning abstract representations. A typical CNN has an input layer, an output layer, The infrared radiation spectrum bands generally applied in technologies involving thermographic images are MWIR and LWIR [29].
Infrared thermal imaging, also called infrared thermography (IRT), is a rapidly evolving technology. Currently, researchers are applying IRT to intelligent solutions in different fields, including condition monitoring, predictive maintenance, and gas detection. Medicine is another area that has benefited from this technology, employing IRT in oncology (breast, skin, etc.), surgery, medication effectiveness monitoring, and, more recently, for acute respiratory syndrome testing applications [30].
Technologies based on IRT can detect the intensity of thermal radiation emitted by objects since the bodies transmit, radiate, and reflect infrared radiation. Radiation transmission, or transmissivity, is the ability of a material to allow infrared radiation to pass through it. Emissivity is the capacity of a material to emit infrared radiation. Finally, reflectivity is the capability of the material/object surface to reflect radiation, that is, temperature reflected from the object.

Machine Learning
With the high volume of data generated by devices, sensors, and users, machines capable of identifying patterns and assisting in making decisions have become essential, with supervised learning and unsupervised machine learning being the most widely adopted methods. Reinforcement and semisupervised learning are other methods that may be used [31].
Deep learning is a set of machine learning technologies that utilize algorithms to detect, recognize, and classify objects and text in images or other documents. One of the leading deep learning architectures is the convolutional neural network (CNN), which is used to solve most image analysis problems [32].

Convolutional Neural Networks
CNNs have been widely applied in image classifiers. They excel in analyzing images and learning abstract representations. A typical CNN has an input layer, an output layer, and several hidden layers. The hidden layers of a CNN generally consist of a series of convolutional layers. The first convolution layer learns to identify the simple features. The following layers learn to detect more significant and complex characteristics. Other operations include the rectified linear unit (ReLU), grouping, fully connected, and normalizing layers. Finally, backpropagation is used for error distribution and weight adjustment [33,34].
Digital images can be represented by a matrix in which each pixel contains one or more values. First, a CNN trains and tests each input image with the pixel values going through a series of convolution operations with filters (kernels). Then, the results are grouped (pooling) to reduce the matrix dimensions and generate a new, simplified matrix. These operations complete the feature-extraction step. Then, a vector is created from the feature map, which is used to feed the input layer of a multilayer neural network (fully connected, FC) [35]. Figure 3 presents a simplified diagram of a CNN [35,36].
layers. Finally, backpropagation is used for error distribution and weight adjustment [33,34].
Digital images can be represented by a matrix in which each pixel contains one or more values. First, a CNN trains and tests each input image with the pixel values going through a series of convolution operations with filters (kernels). Then, the results are grouped (pooling) to reduce the matrix dimensions and generate a new, simplified matrix. These operations complete the feature-extraction step. Then, a vector is created from the feature map, which is used to feed the input layer of a multilayer neural network (fully connected, FC) [35]. Figure 3 presents a simplified diagram of a CNN [35,36].

Region Based Convolutional Neural Networks
Region-based CNNs (R-CNNs) emerged as an improvement of CNNs. They are able to detect and locate specific objects in an image. The architecture of an R-CNN is similar to that of a CNN. However, an added step of extracting the region containing the object to be detected is included [37]. Figure 4 presents a simplified diagram of an R-CNN [36,37].

Region Based Convolutional Neural Networks
Region-based CNNs (R-CNNs) emerged as an improvement of CNNs. They are able to detect and locate specific objects in an image. The architecture of an R-CNN is similar to that of a CNN. However, an added step of extracting the region containing the object to be detected is included [37]. Figure 4 presents a simplified diagram of an R-CNN [36,37].
Digital images can be represented by a matrix in which each pixel contains one or more values. First, a CNN trains and tests each input image with the pixel values going through a series of convolution operations with filters (kernels). Then, the results are grouped (pooling) to reduce the matrix dimensions and generate a new, simplified matrix. These operations complete the feature-extraction step. Then, a vector is created from the feature map, which is used to feed the input layer of a multilayer neural network (fully connected, FC) [35]. Figure 3 presents a simplified diagram of a CNN [35,36].

Region Based Convolutional Neural Networks
Region-based CNNs (R-CNNs) emerged as an improvement of CNNs. They are able to detect and locate specific objects in an image. The architecture of an R-CNN is similar to that of a CNN. However, an added step of extracting the region containing the object to be detected is included [37]. Figure 4 presents a simplified diagram of an R-CNN [36,37].  The R-CNN detector consists of four main steps: candidate box generation, resource extraction, classification, and regression. For candidate box generation, approximately 2000 boxes are determined in the image using the selective search method. For resource extraction, the CNN extracts the resources of each candidate box. In the third step, a classifier determines whether the extracted features belong to a specific class. Finally, the regression step adjusts the position of the bounding box by referring to a particular resource [38,39].

You Only Look Once Network
According to [36], many improved algorithms have emerged from proposals of R-CNN models, all providing different degrees of improvement in the detection performance compared to the original R-CNN.
The You Only Look Once (YOLO) network, proposed by [40], is a pretrained object detector in the common objects in context (COCO) image dataset, with RGB (red, green, and blue) images of various object classes. Its main contribution is real-time image detection. Additionally, unlike other object detection algorithms, the YOLO network input is an entire image. It performs object detection through a fixed-grid regression consisting of 24 convolutional layers and two multilayer neural networks. The network can process images in real-time at 45 frames per second (FPS). Furthermore, YOLO produces fewer false positives than other similar architectures [41]. In this study, YOLO was used to apply the transfer of learning in the training of a specific dataset.
The learning transfer is a technique that takes advantage of a structure of a pre-trained CNNs structure for a given application as a starting point for a new task that is, until then, unknown. Thus, the structure of convolutional layers and filters in the feature extraction stage are reused for a new application. Afterward, changes are made in the FC layer, where the classes of the pre-trained network can be removed and/or new classes can be added to meet the new application. After the FC changes, only this layer needs to be retrained, drastically reducing the effort of training a complete CNN, which demands a high computational cost and requires a large amount of training data to achieve high performance. In this work, a pre-trained structure with a dataset of 998 images will be used to recognize volunteers' faces. The aim is for the new structure to be able to detect the ROIs in human faces, with difficulties not originally imposed: volunteers wearing semifacial masks and glasses.

Optical Character Recognition
Optical character recognition (OCR) is a technology that allows the recognition and extraction of characters in image files to generate analyzable, editable, and searchable data [42]. This technology uses image and natural language processing to solve different challenges [43].
Tesseract is a free open-source OCR software, originally developed at Hewlett-Packard Laboratories Bristol and Hewlett-Packard Co., Greeley, Colorado, between 1985 and 1994. From 2006 to 2018, Google improved the software, and it is currently available on GitHub [44]. It can recognize texts in over 100 languages.
In this study, Tesseract was used to identify the minimum and maximum temperature values in the temperature scale of the analyzed thermographs.

Dataset
The dataset used in this study is publicly available in [45]. The authors used a FLIR Vue Pro camera to capture thermographic images for the dataset. During image capture, participants looked at a fixed point while the camera was moved to nine equidistant positions, forming a semicircle around the volunteer. Thus, the dataset contained nine thermographic images of the face of each participant. Figure 5 displays examples of photos that comprise the dataset. The complete dataset contained 998 images from 111 participants. However, to work on a balanced dataset concerning volunteers' gender, only 781 images were used. Of these, 658 were used for training and 123 for validation of transfer learning by the YOLO network. Aiming to evaluate the performance of YOLO for object detection, it is necessary to The complete dataset contained 998 images from 111 participants. However, to work on a balanced dataset concerning volunteers' gender, only 781 images were used. Of these, 658 were used for training and 123 for validation of transfer learning by the YOLO network. Aiming to evaluate the performance of YOLO for object detection, it is necessary to label each image with the annotations of their respective bounding boxes.
The face ROIs are the ear, eye, forehead, and whole face. These areas have known temperature thresholds for febrility and can be see directly; thus, they are suitable for screening febrile people using thermography [46]. However, not all regions are constantly visible on the person owing to the use of glasses, face masks, hair over the forehead or ear, and others.
LabelImg software, a free graphical tool for image and video annotation [47], was used to label all images used in this study. Figure 6 shows the graphical interface of the LabelImg software. The complete dataset contained 998 images from 111 participants. However, to work on a balanced dataset concerning volunteers' gender, only 781 images were used. Of these, 658 were used for training and 123 for validation of transfer learning by the YOLO network. Aiming to evaluate the performance of YOLO for object detection, it is necessary to label each image with the annotations of their respective bounding boxes.
The face ROIs are the ear, eye, forehead, and whole face. These areas have known temperature thresholds for febrility and can be see directly; thus, they are suitable for screening febrile people using thermography [46]. However, not all regions are constantly visible on the person owing to the use of glasses, face masks, hair over the forehead or ear, and others.
LabelImg software, a free graphical tool for image and video annotation [47], was used to label all images used in this study. Figure 6 shows the graphical interface of the LabelImg software.

Results
The training of the object detector was performed on Google Collaboratory, a computational environment that runs in the cloud and requires no configuration. It allows the writing and executing of code directly in the browser.
For the R-CNN training assessment, it was necessary to quantify the prediction accuracy by comparing the prediction made by this model with the real object location in the image. Thus, the mean average precision (mAP), which is one of the most common metrics for determining the accuracy of object detectors, was employed [48].
Other methods to evaluate the performance of the trained network were precision (P), recall (R), and F1-score (F1). For these metrics, a higher value indicates a better result. Additionally, the values of true positives (TP), false positives (FP), and false negatives (FN) were employed as performance metrics.
The resultant metric values of the trained R-CNN, with a confidence limit of 25% (conf_threshold = 0.25), were as follows: TP = 452, FP = 46, FN = 9, P = 0.91, R = 0.98, and F1 = 0.94. The mAP with an intersection over union (IoU) greater than 50%, also known as mAP@0.50, was 0.97. From Table 1, it is possible to evaluate the performance of the model for each class.
Tests were carried out with photos of six volunteers, different from those present in the training dataset, to evaluate the prediction accuracy of new images. A Testo-885 thermographic camera captured the new images. After transfer learning, the object detector algorithm analyzed these images and detected all ROIs, even for volunteers wearing masks, caps, and with long hair. Figure 7 displays some of these images. F1 = 0.94. The mAP with an intersection over union (IoU) greater than 50%, also known as mAP@0.50, was 0.97. From Table 1, it is possible to evaluate the performance of the model for each class. Tests were carried out with photos of six volunteers, different from those present in the training dataset, to evaluate the prediction accuracy of new images. A Testo-885 thermographic camera captured the new images. After transfer learning, the object detector algorithm analyzed these images and detected all ROIs, even for volunteers wearing masks, caps, and with long hair. Figure 7 displays some of these images. When identifying an object, the YOLO detector provides the coordinates, width, and height of the bounding boxes. This allows delimiting ROIs where the temperature is analyzed. From each ROI, the values of the pixels with the highest temperatures were extracted. Thus, the algorithm discards regions covered by hair, sweat, and fabric, which are When identifying an object, the YOLO detector provides the coordinates, width, and height of the bounding boxes. This allows delimiting ROIs where the temperature is analyzed. From each ROI, the values of the pixels with the highest temperatures were extracted. Thus, the algorithm discards regions covered by hair, sweat, and fabric, which are generally at lower temperatures. The higher temperatures are shown as the lightest colors in Figure 8. Figure 9 displays a boxplot of the pixel values in each ROI, as depicted in Figure 8. The distribution of the pixels in the forehead region displays lower values than those in the eye regions, indicating that the eyes are at a higher temperature than the forehead.
According to [25], the maximum or mean temperatures of ROIs can be adopted to assess human body surfaces. However, the segmentation of ROIs performed in this paper may include background images, parts of the surfaces of glasses, masks, and hair, decreasing the mean temperature of the ROI. Therefore, to avoid this issue, the maximum temperature for each ROI was adopted.
The temperature scale on the right side of Figure 8 indicates that: darker colors are close to a temperature of 24 • C and lighter colors approach 35 • C. In the thermal imager standard operating mode, these values are automatically generated by the camera's operating software, where the highest value indicates the maximum temperature of the objects in the thermal imager's field of view and the lowest value indicates the minimum temperature of the objects. generally at lower temperatures. The higher temperatures are shown as the lightest colors in Figure 8.   Figure 8. The distribution of the pixels in the forehead region displays lower values than those in the eye regions, indicating that the eyes are at a higher temperature than the forehead. According to [25], the maximum or mean temperatures of ROIs can be adopted to assess human body surfaces. However, the segmentation of ROIs performed in this paper may include background images, parts of the surfaces of glasses, masks, and hair, decreasing the mean temperature of the ROI. Therefore, to avoid this issue, the maximum temperature for each ROI was adopted. generally at lower temperatures. The higher temperatures are shown as the lightest colors in Figure 8.   According to [25], the maximum or mean temperatures of ROIs can be adopted to assess human body surfaces. However, the segmentation of ROIs performed in this paper may include background images, parts of the surfaces of glasses, masks, and hair, decreasing the mean temperature of the ROI. Therefore, to avoid this issue, the maximum temperature for each ROI was adopted. Thermogram radiometric output is not always available, depending on the imager manufacturer. Thus, a method for reading temperatures in the region of interest directly in the thermal image was developed, so that the method can be widely used.
Along the temperature scale, there are 267 pixels, where the first one, pixel zero, has a value of 254. The last pixel of the scale has a value of 4. Figure 10 where v is the pixel value and i is the position on the temperature scale.
polynomial, obtained through linear regression, with a coefficient of determination (R ) of 0.9941. The solid line (in red) on the graph shows the behavior of the equation of the straight line that describes this relationship.
where v is the pixel value and i is the position on the temperature scale. As Equation (1) indicates a first-order linear proportionality between pixel position on the scale and temperature, higher temperatures will produce higher pixel values. Thus, the pixels positioned at the beginning of the scale represent the highest temperatures, and pixels at the end indicate the lowest. Figure 11 depicts the relationship between the pixel positions and the respective temperatures of the scale, as shown in Figure 8. Equation (2) presents a straight line that describes this relationship. As Equation (1) indicates a first-order linear proportionality between pixel position on the scale and temperature, higher temperatures will produce higher pixel values. Thus, the pixels positioned at the beginning of the scale represent the highest temperatures, and pixels at the end indicate the lowest. Figure 11 depicts the relationship between the pixel positions and the respective temperatures of the scale, as shown in Figure 8. Equation (2) presents a straight line that describes this relationship.
where i is the pixel position, obtained using Equation (1), y 1 is the highest value recorded on the temperature scale, y 2 is the lowest value recorded on the temperature scale, and T is the temperature (in • C) of the analyzed pixel.
where i is the pixel position, obtained using Equation (1), y1 is the highest value recorded on the temperature scale, y2 is the lowest value recorded on the temperature scale, and T is the temperature (in °C) of the analyzed pixel. Through algebraic manipulation of Equations (1) in (2), it is possible to obtain the value of the temperature T of the analyzed pixel, through the value, v, of the pixel, the (3) After obtaining the highest temperatures of each image ROI, the highest value among these temperatures represents the final temperature of the volunteer. Figure 12 shows images of 24 volunteers, and Table 2 lists the highest temperature recorded in each ROI for each person.   Figure 12. Images produced using the Testo-885 camera and analyzed by YOLO, with temperature estimated using Equation (3).  Table 2 initially shows that there are small variations in temperature (less than 1 • C) among most volunteers. However, considering that the human being is homeothermic, this shows that the surface temperature undergoes variations not experienced by the body temperature. Furthermore, it is confirmed that face surface temperature is predominantly lower than body temperature, as shown by the mean temperature values of non-febrile volunteers.
One can note that for volunteers 6 and 22, only one ROI was visible in the image, permitting to obtain only one temperature for the analysis. The identification of only one ROI in volunteers 6 and 22 is an example of both the limitation of the intelligent system and that its objective was achieved. Volunteers were not in adequate direct sight of the thermal imager, in a way that would allow the identification of more ROIs. However, the system managed to identify at least one ROI, enabling the person's temperature analysis.
Unfortunately, only four febrile volunteers were obtained in the image production campaign with volunteers, who were previously diagnosed as feverish by a health team by checking body temperature. Their measured temperatures are identified as 21, 22, 23, and 24 on Table 2. Despite the low sampling of febrile people, it is noted that: the maximum temperature detected in the region of the face of the volunteers 21 and 24 (37.1 • C and 37.3 • C, respectively) did not exceed the usual fever threshold for central temperature (37.5 • C or 38.0 • C); the temperatures of different facial regions of volunteer 23 showed a 0.5 • C discrepancy, which is significant for the diagnosis of fever. This supports the hypothesis that the febrile diagnostic criteria of core body temperature (37.5 • C or 38.0 • C) should not be applied to human facial temperature.

Conclusions
This study employed a transfer deep learning method to detect and recognize ROIs, including the face, forehead, eyes, and ears, in thermographic images using the YOLO object detector. After training a CNN from a dataset made available by other researchers, images of new volunteers obtained in the laboratory served as input to the CNN to evaluate the detection performance of the ROIs.
Tests verified that ROI detection was feasible even with the use of masks, caps, helmets, or with features hidden by hair.
As displayed in Figure 9, there were variations in temperature among ROIs. The criterion of adopting the highest temperature within each ROI proved to be efficient, as areas without a direct target, such as those covered by hair, are disregarded.
This study presents a simple system for obtaining temperature values directly from thermographic images without significant computational processing. These improvements in detecting the maximum and minimum temperatures of ROIs can provide better results for identifying febrile people.
As infrared thermography measures surface (skin) temperature and not the core temperature of the human body, future work will apply adequate criteria to analyze the febrility of the individuals from the temperatures of the ROIs. This avoids the use of a single temperature threshold to indicate a feverish state for all regions of the human face. Screening people with fevers through infrared thermography should apply a different and adequate threshold temperature for each face region, typically higher than the body threshold temperature (37.5 • C). Additionally, expanding the dataset will improve the detection of ROIs and allow more reliable screening of febrile people.
Finally, other deep learning algorithms will be applied, evaluated, and compared to the results presented in this work.