Low-Resolution Infrared Array Sensor for Counting and Localizing People Indoors: When Low End Technology Meets Cutting Edge Deep Learning Techniques
Abstract
:1. Introduction
- We propose a deep learning-based method for counting and localizing people indoors that can run on low-end devices (a Raspberry Pi) in real time.
- We employed super resolution techniques on low-end sensors as well (i.e., sensors with resolutions equal to 16 × 12 and 8 × 6) to perform the same tasks with comparable results to images of size 32 × 24.
- We built a working device running the proposed approach for practical usage and for actual comparison of the proposed approach with existing ones.
2. Related Work
2.1. Sensors for Healthcare
2.2. Indoor Localization
- Triangulation: Approaches falling into this category, such as [33], are characterized by short coverages and not so good accuracy. They require, in general, direct line of sight and their accuracy deteriorates very quickly with signal multi-pathing.
- Trilateration: Approaches falling into this category, such as [34], share the same overall characteristics of triangulation techniques in terms of accuracy and coverage. They require some a priori knowledge for them to work efficiently.
- Fingerprinting: These techniques [35,36] rely on learning the fingerprints of the different areas of the monitored scene offline and use this knowledge to later on detect the location of objects by comparing the fingerprints. This is obviously the least accurate and most environment dependent approach.
- Proximity Detection: These techniques, such as [37], as their name suggests, simply detect whether two devices are close to each other. They can be used with multiple indoor fixed devices to tell the approximate location of an object. Obviously, they suffer from the very small coverage and low precision.
- Dead Reckoning: These techniques use estimations based on last known measurements to approximate the current location. These techniques suffer mostly from the cumulative error given that the further in time we are, the least likely we have real information (regarding the speed and position) collected.
2.3. Object Detection
- YoloV3 [55]: Yolo stands for “You Only Look Once”. YoloV3 is the newest and most optimized version of the YOLO architecture proposed in [56]. Most of the other works, which perform the object classification at a different region with different sizes and scales of a single image, and every region with a high classification probability score is considered as a potential detection. Yolo’s novelty comes from the fact that they apply a single network on the whole image. The network does the division into regions and the prediction of the objects.
- Single Shot MultiBox Detector (SSD) [57]: SSD follows the same philosophy of Yolo. It takes only one shot to detect multiple objects present in an image using multibox. SSD is composed of two sub-networks put in cascade: a classification network used for feature extraction (backbone) and a set of extra convolutional layers whose objective is to detect the bounding boxes and attribute the confidence scores. VGG-16 [21] is used as a classification backbone for SSD. Six extra convolutional layers are added to VGG-16.
- RetinaNet [52]: RetinaNet [52] is a one stage object detection model that uses the concept of focal loss to address a common problem known in object detection which is the object/background imbalance. RetinaNet identifies regions in the image that contain objects and performs the classification of the objects. Afterward, a regression task is performed to squeeze/extend the bounding boxes to the objects.
3. Motivations and Challenges
3.1. Motivations
3.2. Scope
- 1.
- Train a model to classify 32 × 24 pixel images to detect the number and location of people in a room.
- 2.
- Train a super resolution model to reconstruct high-resolution thermal images from lower resolution ones. The input to this model is images of the size 8 × 6 or 16 × 12 and the output would be images of the same size as ones used initially (i.e., 32 × 24 pixels).
- 3.
- Fine tune the model previously trained to perform the classification task on the new data.
3.3. Challenges
4. System Description and Experiment Specifications
4.1. Equipment
- Panasonic Grid-EYE sensor (https://industrial.panasonic.com/jp/products/pt/grid-eye. Accessed 29 January 2022): This sensor is among the cheapest ones available in market. This sensor, however, has two main drawbacks: (1) it has very narrower angle (i.e., ) and (2) offers only a resolution equal to 8 × 8 pixels. The limited coverage makes its usage in practice require dense deployment to cover a single room. Nevertheless, such a sensor does not offer high enough resolution to train a super resolution network for our approach to run properly. That said, this sensor could benefit from our proposed method itself after training. In other words, after the super resolution network is already trained, it can be applied directly to data collected by this sensor to increase their resolution.
- Heimann sensors (https://www.heimannsensor.com/. Accessed 29 January 2022): These sensors come in a wide variety of resolutions and levels of noise, Field of View (FOV). Namely, their resolution starts from 8 × 8 and increases to 120 × 84 pixels. The main drawback of these sensors is their much higher cost. Nonetheless, these sensors require using their own evaluation kits (which come at a high price as well) making a solution based on them much more expensive.
- Melexis MLX90640 sensors (https://www.melexis.com/en/product/MLX90640/. Accessed 29 January 2022): While other sensors are provided by the same company (namely MLX90614), the MLX90640 offers a high resolution that falls below what is considered “privacy invasive” (i.e., less than 1000 pixels [12]). They come in two main variants: the BAA variant whose FOV is equal to and the BAB variant whose FOV is equal to .
4.2. Environment
- Room 1: This room has a tatami covering the floor, has a large window in one of the walls and is not air conditioned. The temperature is the ambient room temperature.
- Room 2: This room also has a tatami, has a large window in one of the walls and is air conditioned (the temperature of the air conditioner is set to 24C).
- Room 3: This has a slightly reflective ground. It has no windows on the wall and is air conditioned (the air conditioner is set to heat the room to a temperature equal to 26C). The room has a desk, 4 chairs and a bed.
- Room 4: This has a slightly reflective ground. It has no windows on the wall. Instead of an air conditioner, it is heated by a heating device (stove) and a moving device (cleaning robots) were included for more variety in terms of environment conditions.
4.3. Overall System Description
5. Detailed System Description
5.1. Data Collection
- 1.
- Super resolution data: These are data used to train and validate the super resolution model. From several experiments, we collected over 35,000 frames. We used 25,000 frames for training, 10,000 for validation and discarded a few tens of frames.
- 2.
- Classification data: These data are used to train and validate the classifier. We used different scenarios in different room environments as described in the previous section. For each resolution of frames, we used a data set composed of 25,318 frames for training and 7212 frames for testing.
5.2. Super Resolution and Frame Upscaling
- Feature extraction and dimensionality reduction;
- Non-linear mapping;
- Expansion;
- Deconvolution.
5.2.1. Feature Extraction and Dimensionality Reduction
5.2.2. Non-Linear Mapping
5.2.3. Expansion
5.2.4. Deconvolution
5.2.5. Activations and Parameters
5.3. Denoising and Enhancement
5.3.1. Averaging over N Consecutive Frames
5.3.2. Aggressive Denoising
5.3.3. Non-Local Means Denoising (NLMD)
5.4. Counting People
5.5. Identification of the Location of People
- 1.
- Noise Reduction: To facilitate the detection, the first step, as its name implies, is to reduce the image noise. The way this is performed is by using a Gaussian filter to smoothen the frame.
- 2.
- Find the intensity gradient: After reducing the noise, the intensity gradient of colors in the image are derived. To achieve this goal, a Sobel kernel filter [67] is applied on the horizontal and vertical directions. This would allow us to obtain the corresponding respective derivatives and , which, in return, are used to obtain the gradient and orientation of pixels:
- 3.
- Suppression of non-maximums: Edges are, by definition, local maximums. Hence, non-local maximum pixels (obviously in the direction of the gradients measured in the previous step) are discarded. Nevertheless, during this step, fake maximums (i.e., pixels whose gradient is equal to 0, but they are not actual maximums) are identified and discarded.
- 4.
- Double thresholds and hysteresis thresholding: While in the previous step, non-edge pixels are set to 0, edges have different intensities. This step suppresses—if necessary—weak edges (i.e., edges that do not separate two objects or an object from its background). Obviously, the definition of a weak edge implies a subjective decision. This is achieved thanks to two parameters that need to be taken into account: an upper threshold and a lower one.
5.6. Activity Detection
6. Experimental Results
6.1. Data Sets
6.2. High-Resolution Classification Results
6.2.1. Training Set Cross-Validation
- The method where frames captured with size 32 × 24 with no denoising is referred to as ();
- The method where frames captured with size 32 × 24 are denoised by averaging over two consecutive frames is referred to as ();
- The method where frames captured with size 32 × 24 are denoised by the aggressive denoising method is referred to as ();
- The method where frames captured with size 32 × 24 denoised by the NLMD method [61] is referred to as ().
6.2.2. Evaluation on the Test Set
6.3. Low-Resolution Classification Results
6.3.1. Super Resolution: How to Evaluate the Performance
- The method where frames captured with size 32 × 24 are used as they are is referred to as ();
- The method where frames captured with size 32 × 24 are denoised by the NLMD method [61] is referred to as ();
- The method where frames captured with size 16 × 12 are used as they are is referred to as ();
- The method where frames captured with size 16 × 12 are upscaled with the super resolution technique to 32 × 24 is referred to as ();
- The method where frames captured with size 16 × 12 are upscaled with the super resolution technique to 32 × 24 and denoised by averaging over two consecutive frames is referred to as ();
- The method where frames captured with size 16 × 12 are upscaled with the super resolution technique to 32 × 24 and denoised by the aggressive denoising method is referred to as ();
- The method where frames captured with size 16 × 12 are upscaled with the super resolution technique to 32 × 24 and denoised by the NLMD method [61] is referred to as ();
- The method where frames captured with size 8 × 6 are used as they are is referred to as ();
- The method where frames captured with size 8 × 6 are upscaled with the super resolution technique to 32 × 24 is referred to as ();
- The method where frames captured with size 8 × 6 are upscaled with the super resolution technique to 32 × 24 and denoised by averaging over two consecutive frames is referred to as ();
- The method where frames captured with size 8 × 6 are upscaled with the super resolution technique to 32 × 24 and denoised by the aggressive denoising method is referred to as ();
- The method where frames captured with size 8 × 6 are upscaled with the super resolution technique to 32 × 24 and denoised by the NLMD method [61] is referred to as ().
6.3.2. Classification Results
6.4. Discussion
- 1.
- The actual misclassification: As it stands, the current model does not give perfect detection accuracy, even when using the high-resolution frames (i.e., 32 × 24 pixels). As stated above, we believe that the use of LSTM would remedy the problem of misclassification of individual frames by learning over longer periods of time the number and locations of people.
- 2.
- The presence of heat-emitting devices/objects: Devices emitting heat include electronic devices such as computers, heaters or even large open windows allowing for the sunlight to enter the room. Such devices or objects could lead to a misclassification as their heat might be confused with that emitted by a human body. This problem can be also addressed by exploiting the time component. Unlike the first issue we mentioned about the use of few consecutive frames, learning here requires the observation over much longer periods of time that can go to hours to learn the overall behavior of the non-human heat emitters in the room.
- 3.
- The residual heat in furniture (e.g., a bed or a sofa) after a person spends a long time on it: After leaving their bed/seat, the heat absorbed by the piece of furniture will be emitted, leading to a wrong identification of the person. This heat, despite dissipating after a while, is not to be confused by the heat emitted by the person themself. This could be addressed by learning this particular behavior and taking it into account when making the classification decision.
- 4.
- The presence of obstacles: The presence of obstacles is an inherent problem with object detection systems that rely on direct line of sight between the sensor and the object to be detected. This problem can partially be addressed by design choices as for where to place the sensor or by using multiple sensors that cover the entire area of monitoring.
7. Proposed Approach against State-of-the-Art Object Detection
- A flattening layer to transform the input image into a uni-dimensional vector.
- A total of 4 fully-connected dense layers having, respectively, 512, 256, 128, 64 neurons, whose activation is set to ReLU.
- A fully-connected layer with a Softmax activation responsible for determining the class.
7.1. RetinaNet
- 1.
- The backbone: The backbone calculates the feature maps at different scales. This is usually a typical convolutional network that is responsible for computing the feature map over the input image. In our work, we opted for the conventional ResNet34 architecture [20] as a backbone for RetinaNet. It has two parts:- The bottom-up pathway: here, the backbone network calculates the feature maps at different scales.
- The top-down pathway: the top-down pathway upsamples the spatially coarse feature map. Lateral connections merge the top-down layers and bottom-up layers whose size are the same.
 
- 2.
- The classification subnet: This subnet predicts the probability of an object of a given class being present in each anchor box.
- 3.
- Anchor regression subnet: Upon identifying objects, this network offsets the bounding boxes from the anchor boxes for the objects.
7.2. Results Comparison
7.3. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network | 
| DL | Deep Learning | 
| FPS | Frames Per Second | 
| FPRCNN | Fast Super Resolution Convolutional Neural Network | 
| GPS | Global Positioning System | 
| IR | Infrared | 
| KPI | Key Performance Indicator | 
| NLMD | Non-Local Means Denoising | 
| PReLU | Parametric Rectified Linear Unit | 
| SSD | Single Shot MultiBox Detector | 
| TP | True Positives | 
References
- Ketu, S.; Mishra, P.K. Internet of Healthcare Things: A contemporary survey. J. Netw. Comput. Appl. 2021, 192, 103179. [Google Scholar] [CrossRef]
- Perera, M.S.; Halgamuge, M.N.; Samarakody, R.; Mohammad, A. Internet of things in healthcare: A survey of telemedicine systems used for elderly people. In IoT in Healthcare and Ambient Assisted Living; Springer: Berlin/Heidelberg, Germany, 2021; pp. 69–88. [Google Scholar]
- Yang, S.; Wang, D.; Li, W.; Wang, C.; Yang, X.; Lo, K. Decoupling of Elderly Healthcare Demand and Expenditure in China. Healthcare 2021, 9, 1346. [Google Scholar] [CrossRef] [PubMed]
- Hamiduzzaman, M.; De Bellis, A.; Abigail, W.; Kalaitzidis, E.; Harrington, A. The world is not mine–barriers to healthcare access for Bangladeshi rural elderly women. J. Cross-Cult. Gerontol. 2021, 36, 69–89. [Google Scholar] [CrossRef] [PubMed]
- Yotsuyanagi, H.; Kurosaki, M.; Yatsuhashi, H.; Lee, I.H.; Ng, A.; Brooks-Rooney, C.; Nguyen, M.H. Characteristics and healthcare costs in the aging hepatitis B population of Japan: A nationwide real-world analysis. Dig. Dis. 2022, 40, 68–77. [Google Scholar] [CrossRef]
- Qian, K.; Zhang, Z.; Yamamoto, Y.; Schuller, B.W. Artificial intelligence internet of things for the elderly: From assisted living to health-care monitoring. IEEE Signal Process. Mag. 2021, 38, 78–88. [Google Scholar] [CrossRef]
- World Health Organization. WHO Global Report on Falls Prevention in Older Age. Available online: https://www.who.int/ageing/publications/Falls_prevention7March.pdf (accessed on 29 January 2022).
- Wang, J.; Zhai, S. Heart Rate Detection with Multi-Use Capacitive Touch Sensors. U.S. Patent 10,299,729, 28 May 2019. [Google Scholar]
- Rosales, L.; Skubic, M.; Heise, D.; Devaney, M.J.; Schaumburg, M. Heartbeat detection from a hydraulic bed sensor using a clustering approach. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; pp. 2383–2387. [Google Scholar]
- Luo, F.; Poslad, S.; Bodanese, E. Temporal convolutional networks for multiperson activity recognition using a 2-d lidar. IEEE Internet Things J. 2020, 7, 7432–7442. [Google Scholar] [CrossRef]
- Ma, Z.; Bigham, J.; Poslad, S.; Wu, B.; Zhang, X.; Bodanese, E. Device-free, activity during daily life, recognition using a low-cost lidar. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, UAE, 9–13 December 2018; pp. 1–6. [Google Scholar]
- Mashiyama, S.; Hong, J.; Ohtsuki, T. A fall detection system using low resolution infrared array sensor. In Proceedings of the 2014 IEEE 25th Annual International Symposium on Personal, Indoor, and Mobile Radio Communication (PIMRC), Washington DC, USA, 2–5 September 2014; pp. 2109–2113. [Google Scholar]
- Mao, G.; Fidan, B.; Anderson, B.D. Wireless sensor network localization techniques. Comput. Netw. 2007, 51, 2529–2553. [Google Scholar] [CrossRef] [Green Version]
- Sen, S.; Radunovic, B.; Choudhury, R.R.; Minka, T. You Are Facing the Mona Lisa: Spot Localization Using PHY Layer Information. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, Lake District, UK, 25–29 June 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 183–196. [Google Scholar]
- Lim, H.; Kung, L.C.; Hou, J.C.; Luo, H. Zero-Configuration, Robust Indoor Localization: Theory and Experimentation. In Proceedings of the IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications, Barcelona, Spain, 23–29 April 2006; pp. 1–12. [Google Scholar]
- Nandakumar, R.; Chintalapudi, K.K.; Padmanabhan, V.N. Centaur: Locating devices in an office environment. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, Istanbul, Turkey, 22–26 August 2012; pp. 281–292. [Google Scholar]
- Mobark, M.; Chuprat, S.; Mantoro, T. Improving the accuracy of complex activities recognition using accelerometer-embedded mobile phone classifiers. In Proceedings of the 2017 Second International Conference on Informatics and Computing (ICIC), Jayapura, Indonesia, 1–3 November 2017; pp. 1–5. [Google Scholar]
- Atallah, L.; Lo, B.; King, R.; Yang, G.Z. Sensor placement for activity detection using wearable accelerometers. In Proceedings of the 2010 International Conference on Body Sensor Networks, Biopolis, Singapore, 7–9 June 2010; pp. 24–29. [Google Scholar]
- Zhang, D.; Xia, F.; Yang, Z.; Yao, L.; Zhao, W. Localization technologies for indoor human tracking. In Proceedings of the 2010 5th International Conference on Future Information Technology, Busan, Korea, 21–23 May 2010; pp. 1–6. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Mathie, M.; Coster, A.; Lovell, N.; Celler, B. Detection of daily physical activities using a triaxial accelerometer. Med. Biol. Eng. Comput. 2003, 41, 296–301. [Google Scholar] [CrossRef]
- Bao, L.; Intille, S.S. Activity recognition from user-annotated acceleration data. In International Conference on Pervasive Computing; Springer: Berlin/Heidelberg, Germany, 2004; pp. 1–17. [Google Scholar]
- Lo, B.; Atallah, L.; Aziz, O.; El ElHew, M.; Darzi, A.; Yang, G.Z. Real-time pervasive monitoring for postoperative care. In Proceedings of the 4th International Workshop on Wearable and Implantable Body Sensor Networks (BSN 2007), Aachen, Germany, 26–28 March 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 122–127. [Google Scholar]
- Cornacchia, M.; Ozcan, K.; Zheng, Y.; Velipasalar, S. A survey on activity detection and classification using wearable sensors. IEEE Sens. J. 2016, 17, 386–403. [Google Scholar] [CrossRef]
- Liu, T.; Guo, X.; Wang, G. Elderly-falling detection using distributed direction-sensitive pyroelectric infrared sensor arrays. Multidimens. Syst. Signal Process. 2012, 23, 451–467. [Google Scholar] [CrossRef]
- Want, R.; Hopper, A.; Falcao, V.; Gibbons, J. The active badge location system. ACM Trans. Inf. Syst. (TOIS) 1992, 10, 91–102. [Google Scholar] [CrossRef]
- LLC, M. Firefly Motion Tracking System User’s Guide. Available online: http://www.gesturecentral.com/firefly/FireflyUserGuide.pdf (accessed on 29 January 2021).
- Hou, X.; Arslan, T. Monte Carlo localization algorithm for indoor positioning using Bluetooth low energy devices. In Proceedings of the 2017 International Conference on Localization and GNSS (ICL-GNSS), Nottingham, UK, 27–29 June 2017; pp. 1–6. [Google Scholar]
- Radoi, I.E.; Cirimpei, D.; Radu, V. Localization systems repository: A platform for open-source localization systems and datasets. In Proceedings of the 2019 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Pisa, Italy, 30 September–3 October 2019; pp. 1–8. [Google Scholar]
- Dinh-Van, N.; Nashashibi, F.; Thanh-Huong, N.; Castelli, E. Indoor Intelligent Vehicle localization using WiFi received signal strength indicator. In Proceedings of the 2017 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), Aichi, Japan, 19–21 March 2017; pp. 33–36. [Google Scholar]
- Zhu, J.Y.; Xu, J.; Zheng, A.X.; He, J.; Wu, C.; Li, V.O. Wifi fingerprinting indoor localization system based on spatio-temporal (S-T) metrics. In Proceedings of the 2014 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Busan, Korea, 27–30 October 2014; pp. 611–614. [Google Scholar]
- Kabir, A.L.; Saha, R.; Khan, M.A.; Sohul, M.M. Locating Mobile Station Using Joint TOA/AOA. In Proceedings of the 4th International Conference on Ubiquitous Information Technologies & Applications, Jeju, Korea, 15–17 December 2021; pp. 1–6. [Google Scholar]
- Kul, G.; Özyer, T.; Tavli, B. IEEE 802.11 WLAN based real time indoor positioning: Literature survey and experimental investigations. Procedia Comput. Sci. 2014, 34, 157–164. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Wu, C.; Liu, Y. Locating in fingerprint space: Wireless indoor localization with little human intervention. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, Istanbul, Turkey, 22–26 August 2012; pp. 269–280. [Google Scholar]
- Wang, X.; Gao, L.; Mao, S.; Pandey, S. CSI-based fingerprinting for indoor localization: A deep learning approach. IEEE Trans. Veh. Technol. 2016, 66, 763–776. [Google Scholar] [CrossRef] [Green Version]
- Brida, P.; Duha, J.; Krasnovsky, M. On the accuracy of weighted proximity based localization in wireless sensor networks. In Personal Wireless Communications; Springer: Berlin/Heidelberg, Germany, 2007; pp. 423–432. [Google Scholar]
- Hassanhosseini, S.; Taban, M.R.; Abouei, J.; Mohammadi, A. Improving performance of indoor localization using compressive sensing and normal hedge algorithm. Turk. J. Electr. Eng. Comput. Sci. 2020, 28, 2143–2157. [Google Scholar] [CrossRef]
- Wang, J.; Dhanapal, R.K.; Ramakrishnan, P.; Balasingam, B.; Souza, T.; Maev, R. Active RFID Based Indoor Localization. In Proceedings of the 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada, 2–5 July 2019; pp. 1–7. [Google Scholar]
- Salman, A.; El-Tawab, S.; Yorio, Z.; Hilal, A. Indoor Localization Using 802.11 WiFi and IoT Edge Nodes. In Proceedings of the 2018 IEEE Global Conference on Internet of Things (GCIoT), Alexandria, Egypt, 5–7 December 2018; pp. 1–5. [Google Scholar]
- Nguyen, Q.H.; Johnson, P.; Nguyen, T.T.; Randles, M. A novel architecture using iBeacons for localization and tracking of people within healthcare environment. In Proceedings of the 2019 Global IoT Summit (GIoTS), Aarhus, Denmark, 17–21 June 2019; pp. 1–6. [Google Scholar]
- Anastasiou, A.; Pitoglou, S.; Androutsou, T.; Kostalas, E.; Matsopoulos, G.; Koutsouris, D. MODELHealth: An Innovative Software Platform for Machine Learning in Healthcare Leveraging Indoor Localization Services. In Proceedings of the 2019 20th IEEE International Conference on Mobile Data Management (MDM), Hong Kong, China, 13 June 2019; pp. 443–446. [Google Scholar]
- Pitoglou, S.; Anastasiou, A.; Androutsou, T.; Giannouli, D.; Kostalas, E.; Matsopoulos, G.; Koutsouris, D. MODELHealth: Facilitating Machine Learning on Big Health Data Networks. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 2174–2177. [Google Scholar]
- Pedrollo, G.; Konzen, A.A.; de Morais, W.O.; Pignaton de Freitas, E. Using Smart Virtual-Sensor Nodes to Improve the Robustness of Indoor Localization Systems. Sensors 2021, 21, 3912. [Google Scholar] [CrossRef]
- Nakamura, T.; Bouazizi, M.; Yamamoto, K.; Ohtsuki, T. Wi-Fi-CSI-based Fall Detection by Spectrogram Analysis with CNN. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar]
- Keenan, R.M.; Tran, L.N. Fall Detection using Wi-Fi Signals and Threshold-Based Activity Segmentation. In Proceedings of the 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, London, UK, 31 August–3 September 2020; pp. 1–6. [Google Scholar]
- Wang, Y.; Yang, S.; Li, F.; Wu, Y.; Wang, Y. FallViewer: A Fine-Grained Indoor Fall Detection System With Ubiquitous Wi-Fi Devices. IEEE Int. Things J. 2021, 8, 12455–12466. [Google Scholar] [CrossRef]
- Bouazizi, M.; Ye, C.; Ohtsuki, T. 2D LIDAR-Based Approach for Activity Identification and Fall Detection. IEEE Int. Things J. 2021, 1. [Google Scholar] [CrossRef]
- Bouazizi, M.; Ohtsuki, T. An Infrared Array Sensor-Based Method for Localizing and Counting People for Health Care and Monitoring. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, Canada, 20–24 July 2020; pp. 4151–4155. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Los Alamitos, CA, USA, 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Muthukumar, K.; Bouazizi, M.; Ohtsuki, T. A Novel Hybrid Deep Learning Model for Activity Detection Using Wide-Angle Low-Resolution Infrared Array Sensor. IEEE Access 2021, 9, 82563–82576. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
- Buades, A.; Coll, B.; Morel, J.M. Non-Local Means Denoising. Image Process. Line 2011, 1, 208–212. [Google Scholar] [CrossRef] [Green Version]
- Jain, P.; Tyagi, V. A survey of edge-preserving image denoising methods. Inf. Syst. Front. 2016, 18, 159–170. [Google Scholar] [CrossRef]
- Diwakar, M.; Kumar, M. A review on CT image noise and its denoising. Biomed. Signal Process. Control 2018, 42, 73–88. [Google Scholar] [CrossRef]
- Fan, L.; Zhang, F.; Fan, H.; Zhang, C. Brief review of image denoising techniques. Vis. Comput. Ind. Biomed. Art 2019, 2, 1–12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ponnuru, R.; Pookalangara, A.K.; Nidamarty, R.K.; Jain, R.K. CIFAR-10 Classification Using Intel® Optimization for TensorFlow*. Available online: https://www.intel.com/content/www/us/en/developer/articles/technical/cifar-10-classification-using-optimization-for-tensorflow.html (accessed on 29 January 2022).
- Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 6, 679–698. [Google Scholar] [CrossRef]
- Sobel, I.; Feldman, G. An Isotropic 3 × 3 Image Gradient Operator. Presentation at Stanford AI Project. Available online: https://www.researchgate.net/publication/285159837_A_33_isotropic_gradient_operator_for_image_processing (accessed on 29 January 2022).
- Keys, R. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1153–1160. [Google Scholar] [CrossRef] [Green Version]
- Howard, J.; Gugger, S. Fastai: A layered API for deep learning. Information 2020, 11, 108. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]












| IR Senor Model | MLX90640 | 
|---|---|
| Voltage | 3.3 V | 
| Temperature range | C | 
| Resolution | 32 × 24 − 16 × 12 − 8 × 6 pixels | 
| Recording rate | 1, 2, 4, 8, 16, 32 and 64 fps | 
| Coverage | 
| Layer Type | Number of Filters | Filter Size | 
|---|---|---|
| Conv 2D | 56 | 5 × 5 | 
| Conv 2D | 16 | 1 × 1 | 
| Conv 2D | 12 | 3 × 3 | 
| Conv 2D | 12 | 3 × 3 | 
| Conv 2D | 12 | 3 × 3 | 
| Conv 2D | 12 | 3 × 3 | 
| Conv 2D | 56 | 1 × 1 | 
| DeConv 2D | 1 | 9 × 9 | 
| Layer Type | Number of Filters | FC Neurons | 
|---|---|---|
| Conv 2D | 32 | - | 
| Conv 2D | 32 | - | 
| Max pooling 2D | - | - | 
| Conv 2D | 64 | - | 
| Conv 2D | 64 | - | 
| Max pooling 2D | - | - | 
| Conv 2D | 128 | - | 
| Conv 2D | 128 | - | 
| Max pooling 2D | - | - | 
| Conv 2D | 64 | - | 
| Conv 2D | 64 | - | 
| Max pooling 2D | - | - | 
| Conv 2D | 32 | - | 
| Flatten | - | - | 
| Dense | - | 64 | 
| Dense | - | 4 | 
| Number of People | 0 | 1 | 2 | 3 | 
|---|---|---|---|---|
| Training set | 5129 | 6583 | 7348 | 6258 | 
| Test set | 1298 | 1546 | 2810 | 1558 | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| () | 97.48% | 97.46% | 97.48% | 97.47% | 
| () | 97.51% | 97.50% | 97.51% | 97.51% | 
| () | 97.82% | 97.84% | 97.82% | 97.83% | 
| () | 97.84% | 97.88% | 97.84% | 97.86% | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| Fold 1 | 97.85% | 97.87% | 97.85% | 97.86% | 
| Fold 2 | 98.01% | 98.05% | 98.01% | 98.03% | 
| Fold 3 | 98.14% | 98.14% | 98.14% | 98.14% | 
| Fold 4 | 98.08% | 98.09% | 98.08% | 98.08% | 
| Fold 5 | 97.11% | 97.25% | 97.11% | 97.18% | 
| Average | 97.84% | 97.88% | 97.84% | 97.86% | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| Class 0 | 100% | 100% | 100% | 100% | 
| Class 1 | 99.29% | 98.27% | 99.29% | 98.78% | 
| Class 2 | 98.33% | 95.80% | 98.33% | 97.05% | 
| Class 3 | 92.94% | 98.64% | 92.94% | 95.70% | 
| Overall | 97.67% | 97.70% | 97.67% | 97.66% | 
| Class | Classified as | |||
|---|---|---|---|---|
| 0 | 1 | 2 | 3 | |
| Class 0 | 1298 | 0 | 0 | 0 | 
| Class 1 | 0 | 1535 | 11 | 0 | 
| Class 2 | 0 | 27 | 2763 | 20 | 
| Class 3 | 0 | 0 | 110 | 1448 | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| () | 97.48% | 97.46% | 97.48% | 97.47% | 
| () | 97.84% | 97.88% | 97.84% | 97.86% | 
| () | 89.12% | 89.89% | 89.12% | 89.50% | 
| () | 95.44% | 95.68% | 95.44% | 95.56% | 
| () | 96.01% | 96.14% | 96.01% | 96.07% | 
| () | 96.33% | 96.56% | 96.33% | 96.45% | 
| () | 96.78% | 96.94% | 96.78% | 96.86% | 
| () | 72.89% | 73.55% | 72.89% | 73.22% | 
| () | 86.99% | 86.78% | 86.99% | 86.88% | 
| () | 87.40% | 87.12% | 87.40% | 87.26% | 
| () | 87.76% | 87.71% | 87.76% | 87.73% | 
| () | 88.01% | 87.98% | 88.01% | 88.00% | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| () | 97.59% | 97.62% | 97.59% | 97.59% | 
| () | 97.68% | 97.73% | 97.68% | 97.70% | 
| () | 86.88% | 87.52% | 86.88% | 87.20% | 
| () | 94.05% | 92.18% | 94.05% | 93.11% | 
| () | 94.66% | 94.72% | 94.66% | 94.69% | 
| () | 94.86% | 94.94% | 94.86% | 94.90% | 
| () | 94.90% | 94.94% | 94.90% | 94.92% | 
| () | 70.80% | 73.45% | 70.80% | 72.10% | 
| () | 85.89% | 86.68% | 85.89% | 86.28% | 
| () | 86.47% | 86.57% | 86.47% | 86.52% | 
| () | 86.57% | 86.58% | 86.57% | 86.58% | 
| () | 86.79% | 86.87% | 86.79% | 86.83% | 
| Class | Classified as | |||
|---|---|---|---|---|
| 0 | 1 | 2 | 3 | |
| Class 0 | 1291 | 7 | 0 | 0 | 
| Class 1 | 4 | 1539 | 3 | 0 | 
| Class 2 | 0 | 110 | 2628 | 72 | 
| Class 3 | 0 | 11 | 161 | 1386 | 
| Class | Classified as | |||
|---|---|---|---|---|
| 0 | 1 | 2 | 3 | |
| Class 0 | 1259 | 39 | 0 | 0 | 
| Class 1 | 27 | 1412 | 99 | 8 | 
| Class 2 | 0 | 231 | 2351 | 228 | 
| Class 3 | 0 | 33 | 288 | 1237 | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| Baseline (8 × 6) | 60.11% | 59.44% | 60.11% | 59.77% | 
| Baseline (32 × 24) | 82.14% | 82.83% | 82.14% | 82.48% | 
| RetinaNet [52] (8 × 6) | 78.14% | 78.04% | 78.14% | 78.09% | 
| RetinaNet [52] (32 × 24) | 98.56% | 98.44% | 98.56% | 98.50% | 
| 86.79% | 86.87% | 86.79% | 86.83% | |
| 97.68% | 97.73% | 97.68% | 97.70% | 
| TP Rate | Precision | Recall | F-Measure | |
|---|---|---|---|---|
| RetinaNet [52] (cam) | 99.32% | 99.40% | 99.32% | 99.36% | 
| RetinaNet [52] (HF) | 98.56% | 98.44% | 98.56% | 98.50% | 
| Model | Execution Time | 
|---|---|
| Baseline | 10 ms | 
| RetinaNet [52] | 121 ms | 
| Proposed | 15 ms | 
| Approach | Year | Results | Remarks | 
|---|---|---|---|
| [38] | 2020 | RMSE = 0.6241 m | . Works on large spaces . Requires carrying device . Computationally expensive | 
| [39] | 2019 | RMSE = 0.5∼2.0 m | . Requires carrying the active RFID . Requires a large number of RFID readers | 
| [40] | 2018 | Error % = 5∼40% | . Uses WiFi signals . Mass deployment is expensive . Does not run locally . Privacy issues | 
| [41] | 2019 | RMSE = 0.7 m | . Uses BLE devices . Cheap cost . Some assumptions are not realistic . Requires carrying devices | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bouazizi, M.; Ye, C.; Ohtsuki, T. Low-Resolution Infrared Array Sensor for Counting and Localizing People Indoors: When Low End Technology Meets Cutting Edge Deep Learning Techniques. Information 2022, 13, 132. https://doi.org/10.3390/info13030132
Bouazizi M, Ye C, Ohtsuki T. Low-Resolution Infrared Array Sensor for Counting and Localizing People Indoors: When Low End Technology Meets Cutting Edge Deep Learning Techniques. Information. 2022; 13(3):132. https://doi.org/10.3390/info13030132
Chicago/Turabian StyleBouazizi, Mondher, Chen Ye, and Tomoaki Ohtsuki. 2022. "Low-Resolution Infrared Array Sensor for Counting and Localizing People Indoors: When Low End Technology Meets Cutting Edge Deep Learning Techniques" Information 13, no. 3: 132. https://doi.org/10.3390/info13030132
APA StyleBouazizi, M., Ye, C., & Ohtsuki, T. (2022). Low-Resolution Infrared Array Sensor for Counting and Localizing People Indoors: When Low End Technology Meets Cutting Edge Deep Learning Techniques. Information, 13(3), 132. https://doi.org/10.3390/info13030132
 
         
                                                



 
       