Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network

Oh, Sejik; Lee, Kyoung Min; Lee, Seok Young; Kwon, Nam Kyu

doi:10.3390/jsan14030061

Open AccessArticle

Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network

¹

Department of Robot AI Convergence, Yeungnam University, Gyeongsan 38541, Republic of Korea

²

Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea

³

Department of Artificial Intelligence and Information Technology, Sejong University, Seoul 05006, Republic of Korea

^*

Authors to whom correspondence should be addressed.

J. Sens. Actuator Netw. 2025, 14(3), 61; https://doi.org/10.3390/jsan14030061

Submission received: 28 April 2025 / Revised: 26 May 2025 / Accepted: 10 June 2025 / Published: 11 June 2025

(This article belongs to the Special Issue Security and Smart Applications in IoT and Wireless Sensor and Actuator Networks)

Download

Browse Figures

Versions Notes

Abstract

This study proposes an effective method for identifying human movement direction in indoor environments by leveraging a low-resolution time-of-flight (ToF) sensor and a long short-term memory (LSTM) neural network model. While previous studies have employed camera-based or high-resolution ToF-based sensors, we utilize an 8 × 8 array ToF sensor, which is neither expensive nor related to any privacy issues. Furthermore, in contrast to the conventional rule-based algorithm, the proposed method employs the LSTM model to effectively handle the sequential time-series data. Experimental evaluations, including both basic single-person scenarios and complex multi-user challenge scenarios, confirm that the proposed LSTM-based approach achieves outstanding accuracy of 98% in identifying human entry and exit movements.

Keywords:

time of flight (ToF); long short-term memory (LSTM); people counting; human movement direction classification

1. Introduction

The advancement of sensor-based automation technologies and the development of the internet of things (IoT) have not only enhanced the convenience of daily life but also contributed significantly to improving energy efficiency in building management systems [1,2,3,4]. In particular, public institutions and commercial buildings increasingly require automated systems capable of accurately detecting indoor occupancy to optimize energy consumption related to heating, ventilation, air conditioning (HVAC) and lighting systems [5,6]. As a result, extensive research has been performed on developing energy optimization systems based on human detection and movement analysis technologies [7,8,9,10,11].

Traditional approaches have mainly relied on camera-based sensors or high-resolution time-of-flight (ToF) sensors, each with inherent limitations. Camera-based systems raise concerns about personal privacy, as they expose facial features and movement trajectories, making them unsuitable for privacy-sensitive environments, such as restrooms [12,13]. On the other hand, although the high-resolution ToF sensors do not raise any privacy issues, these sensors are expensive, so there are limitations for large-scale deployment and maintenance [14,15,16]. For this reason, several studies have been performed on occupancy detection and movement estimation using low-resolution ToF sensors. However, existing methods often rely on rule-based algorithms, such as blob detection and random forest regression models, which may demonstrate acceptable performance in controlled settings but lack adaptability in complex environments involving multiple users and dynamic conditions [17].

To overcome these issues, this study proposes a novel approach that combines a low-resolution ToF sensor with a long short-term memory (LSTM) neural network capable of learning from sequential time-series data. The proposed method utilizes an 8 × 8 low-resolution ToF sensor, effectively resolving the privacy issues associated with camera-based systems while offering greater cost efficiency compared to high-resolution ToF sensors. Although low-resolution ToF sensors are inherently limited in distinguishing fine-grained human actions or resolving interactions in densely populated environments, the objective of this study is confined to recognizing human presence and movement direction in low-traffic, structured indoor spaces, such as doorways and corridors. In such contexts, detailed spatial resolution is not essential, whereas privacy preservation and cost efficiency are of primary concern. Accordingly, the adoption of an 8 × 8 ToF sensor array is considered both sufficient and appropriate for the intended application domain. Furthermore, to overcome the inflexibility and performance limitations often observed in conventional rule-based algorithms, the proposed method employs a long short-term memory (LSTM) neural network, which enhances adaptability to dynamic indoor environments, including those involving multiple users. By directly learning human movement patterns to analyze the sequence data acquired by the sensors, the proposed LSTM model can classify the direction of entry and exit with high accuracy even in complex indoor environments. The experimental results demonstrate that the proposed method achieves a high average accuracy of 98% across various scenarios, including single- and multiple-user cases, and maintains robust performance under unseen user conditions, such as different walking patterns and speeds.

2. Background

2.1. Research on People Counting

Occupancy counting plays a critical role in building energy management and automation systems, particularly in reducing energy consumption through the optimization of lighting and HVAC operations. The core objective of this technology is to accurately detect the number and density of people entering and occupying indoor spaces, and a variety of sensor-based approaches have been studied to achieve this.

One widely adopted commercial method utilizes passive infrared (PIR) sensors [18,19,20]. PIR sensors detect changes in infrared radiation caused by human motion using a pyroelectric sensor and a Fresnel lens. They are typically mounted on ceilings or walls and offer a wide field of view (FoV) for motion detection. However, PIR sensors cannot determine movement direction, and they are susceptible to false positives from environmental temperature changes. Considering these challenges, active sensing technologies such as low-resolution ToF sensors present a meaningful alternative to conventional passive systems, especially in scenarios where determining the direction of human movement is required. Meanwhile, high-resolution camera-based systems have also been extensively researched for occupancy detection [21,22,23]. These systems can achieve high accuracy by integrating object detection algorithms, such as YOLO and Faster R-CNN. Nonetheless, camera-based approaches raise significant privacy concerns, especially in sensitive environments, such as hospitals and restrooms. Although some studies have attempted to address privacy issues through anonymization techniques (e.g., face blurring or low-resolution imaging), these solutions often increase computational overhead and decrease detection accuracy while also being sensitive to environmental changes [24,25,26].

Other studies have explored the use of radio frequency (RF) sensors, including Wi-Fi, Bluetooth and ultra-wideband (UWB), to estimate occupancy [27]. These techniques infer the number of users by measuring signal strength from mobile devices within a space. However, they require users to carry mobile devices, which limits their applicability in situations where users do not have smartphones or where signal quality is poor. Even though low-resolution thermal cameras have also been investigated for presence detection, distinguishing between human heat sources and environmental thermal noise nevertheless remains a major challenge [28,29].

Meanwhile, recent studies have also explored the use of low-resolution ToF sensors, which are inherently free from privacy concerns and offer significantly lower costs compared to both camera systems and high-resolution ToF sensors. These characteristics make them a promising alternative for smart building applications. For instance, one prior study combined a random forest regression model with clustering techniques and the Hungarian algorithm to estimate human movement direction using low-resolution ToF data [17]. However, this approach exhibited significant performance degradation in densely populated environments due to the structural limitations of clustering algorithms. Specifically, when two individuals moved within close distance (less than 1 m), the error rate increased by up to 20.5%, limiting the system’s applicability in multi-user indoor environments. Moreover, due to constraints imposed by the COVID-19 pandemic, this study generated approximately 900 datasets in a simulated environment, which may have affected the reliability and generalizability of the results in real-world scenarios.

To overcome these challenges, the present study proposes a new approach that integrates a low-resolution ToF sensor with an LSTM neural network, which is highly effective in learning from sequential data. Unlike rule-based algorithms, the LSTM model can capture long-term temporal dependencies and maintain robust performance in dynamic and complex multi-user environments. Additionally, the proposed model can be easily adapted to new environments with minimal additional training, setting it apart from previous studies.

2.2. Overview of Recurrent Neural Network and LSTM

Time-series data are characterized by the dependency of the current state on its previous states. To effectively process such sequential data, the recurrent neural network (RNN) architecture was introduced in [30]. Unlike conventional feedforward neural networks, RNNs maintain and reuse information from previous time steps through hidden layers, enabling the learning of continuous patterns over time, as seen in Figure 1. This capability has led to their wide application in various time-series analysis tasks, including natural language processing, stock market prediction and human behavior analysis.

However, traditional RNNs suffer from the vanishing gradient problem when dealing with long sequences. During backpropagation, the gradients used to update weights can diminish exponentially over time steps, particularly when activation functions, such as sigmoid or hyperbolic tangent, are applied. This phenomenon hinders the network’s ability to retain information from earlier time steps, thereby limiting its capacity to model long-term temporal dependencies. This issue is especially problematic in movement direction classification, where the task requires integrating positional changes across multiple sequential frames. If the initial motion information is not preserved, the network may fail to capture the continuous trajectory of movement, such as a gradual transition from one direction to another. Consequently, the model may exhibit reduced sensitivity to direction changes that occur progressively over time, leading to inaccurate classification in real-world scenarios.

To overcome this limitation, the LSTM model was proposed [31]. LSTM enhances the traditional RNN structure by introducing a cell state mechanism that allows important information to be preserved over long periods, while irrelevant data are selectively discarded. This is achieved through three key gating mechanisms represented by the following equations:

f_{t} = σ (W_{f} [x_{t}, h_{t - 1}] + b_{f}),

(1)

i_{t} = σ (W_{i} [x_{t}, h_{t - 1}] + b_{i}),

(2)

o_{t} = σ (W_{o} [x_{t}, h_{t - 1}] + b_{o}) .

(3)

Here,

σ

denotes the sigmoid activation function, and

W

and

b

represent the weights and biases, respectively. Equation (1) defines the forget gate, which determines the parts of the previous cell state that should be discarded. Values closer to 1 retain the information, while values near 0 remove it. Equation (2) is the input gate, which selectively updates the cell state with new information. Equation (3) represents the output gate, which decides how much of the cell state should influence the current hidden state. The cell state

C_{t}

and hidden state

h_{t}

are updated as follows:

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t},

(4)

{\tilde{C}}_{t} = \tanh (W_{C} [x_{t}, h_{t - 1}] + b_{C}),

(5)

h_{t} = o_{t} \cdot \tanh (C_{t}) .

(6)

In Equation (5),

{\tilde{C}}_{t}

represents the candidate cell state, which is generated by combining the current input and the previous hidden state through the hyperbolic tangent activation. Equation (4) calculates the updated cell state by combining the retained information from the past and the newly selected information. Equation (6) computes the hidden state by applying the output gate to the updated cell state. Through this gating mechanism, LSTM networks can effectively retain and learn relevant features from time-series data while also mitigating the vanishing gradient issue, enabling stable training over long sequences.

In this study, we process time-series data collected from the low-resolution ToF sensor using the LSTM model. The ToF sensor operates at 15 frames per second, capturing sequential movement patterns during human motion. Unlike rule-based approaches or random forest regression models, which rely on predefined heuristics and struggle to adapt to dynamic environments, LSTM networks are capable of learning continuous motion patterns and maintaining long-term temporal information via cell states. This allows the model to generalize well even in multi-user settings and under varying environmental conditions. By utilizing the input and forget gates, LSTM effectively filters out irrelevant information while retaining essential details, such as entry direction and movement patterns. Based on these advantages, the proposed approach enables accurate detection of human presence and movement direction in complex indoor environments using low-resolution ToF sensor data.

3. Materials and Methods

3.1. ToF Sensor

The ToF sensor is operated by measuring the round-trip time of flight to calculate the distance to an object. Compared to high-resolution camera sensors, ToF sensors offer advantages such as lower cost, enhanced privacy protection and robustness to environmental changes.

In this study, we utilize the VL53L8CX ToF sensor (STMicroelectronics, Geneva, Switzerland), shown in Figure 2a,b, to detect human movement in indoor environments. The VL53L8CX is a cost-effective ToF module based on vertical-cavity surface-emitting laser technology. It incorporates an 8 × 8 array of single-photon avalanche diode receivers and can detect objects up to 4 m away. As illustrated in Figure 2c, the sensor features a maximum diagonal FoV of 65°, with each individual detection zone covering approximately 5.625 × 5.625°, forming a grid for distance measurements, as shown in Figure 2d. Compared to conventional high-resolution camera-based systems, ToF sensors ensure better privacy and reduced system costs. Specifically, the VL53L8CX sensor used in this study, with its 8 × 8 low-resolution array structure, enables the differentiation of movement directions while minimizing data processing overhead relative to high-resolution alternatives.

3.2. Data Preprocessing

The ToF sensor measures the distance to objects and provides output in the form of the 8 × 8 array, denoted as

D_{t}

, as shown in Figure 3. Since the sensor is mounted on the ceiling and performs downward measurements, it does not directly measure the height of the person from the floor (denoted as

H_{t}

in Figure 3). Therefore, to effectively identify human presence and distinguish moving objects from static background noise, such as furniture near the floor, the raw data must be converted into relative height values with respect to the floor.

To achieve this, the system first measures the average distance to the floor during an initialization phase. This reference distance,

d_{f l o o r}

, is calculated using the central 4 × 4 region of the

D_{t}

array to efficiently calculate the vertical distance between the sensor and the floor. Subsequently, each measured frame is converted into a relative height map

H_{t}

by subtracting the current

D_{t}

array from the floor reference:

H_{t} = d_{f l o o r} - D_{t} .

(7)

This transformation yields relative height data

H_{t}

, effectively removing the floor from consideration and allowing the system to more clearly capture human movement patterns within the sensor’s field of view.

The data acquisition process is illustrated in Figure 4. An embedded board equipped with the ToF sensor transmits distance measurements from each receiver module to the PC via I2C communication. Along with the distance values, the data stream may include “X” entries, which indicate diodes that failed to receive the reflected signals from an object. The transmitted data are used as input for training the neural network model.

However, temporary delays or transmission errors in the I2C communication process may introduce noise into the measured data. For example, some parts of the raw distance data may be lost, or the next frame’s data may be sent prematurely. In some cases, characters intended to follow an “X” value may incorrectly merge with the “X” entry, resulting in corrupted data formats like “X1000”. Such noisy data can degrade the performance of the neural network, both during training and in real-time inference.

To ensure data quality, the noise detection and removal process is implemented, as shown in Figure 5. The sensor board includes the header string “START” at the beginning of each data packet to indicate the start of a new frame. The PC uses this marker to verify whether the complete set of 64 distance values corresponding to the 8 × 8 array is present. Any value marked as “X” (indicating missing reflections) is replaced with zero. If the frame contains invalid or incomplete data like “X1000”, it is discarded to maintain the integrity of the dataset. This process helps minimize noise and ensures that the neural network is trained with reliable and high-quality data.

3.3. Dataset Collection

For data collection, the VL53L8CX ToF sensor is mounted on the ceiling at a height of 2.45 m, as shown in Figure 6. The sensor operates at a sampling rate of 15 frames per second (fps) and is configured to record continuous 8 × 8 array distance data as the person passes through its detection area within the FoV. Each data measurement takes about 3–4 s, and the size of the sequence data follows a distribution of about 90–120 frames, with an average size of 105 frames. The experimental environment is designed to allow individuals to naturally walk across the sensor’s FoV, and data are collected for both leftward and rightward movement directions within the detection zone. To maintain consistency regardless of sensor orientation, movements from left to right are labeled as “rightward” and those from right to left as “leftward”. These direction-based labels are used to train the LSTM model for classifying movement direction.

To ensure diversity in the dataset, the participants included four females with heights ranging from 155 cm to 162 cm and four males with heights ranging from 170 cm to 182 cm. Each participant performed various movement patterns within the experimental space, and the collected data were used as input for training the neural network model. To reflect diverse movement scenarios, each participant was given autonomy over their walking behavior, including variations in walking speed, irregular speed patterns and upper limb movements during walking. The dataset constructed for this study consisted of two main scenario types: Basic Scenarios and Challenge Scenarios. Figure 7a illustrates representative examples of these basic movement patterns. The Basic Scenarios include standard movement patterns where a single person walks through the detection area. These are further categorized into three labels: “rightward”, “leftward” and “turn”, the latter referring to a directional change during movement.

Additionally, to account for various real-world variables in indoor environments, scenarios where users carry an object (e.g., a box) while walking are also included in the Basic Scenario dataset. As shown in Figure 7 and Figure 8, carrying an object alters the typical shape of the ToF data. In normal walking, reflections are typically received from the head, shoulders and torso. However, when a user carries an object, some of the reflected signals are added or altered by the object, resulting in localized dips and rises in the height map. In rule-based systems, such anomalies may cause the data to be misclassified—either as non-human or as two separate entities. To enable the neural network to learn such variations, these object-carrying movement patterns are also included in the training dataset.

The Challenge Scenarios involve complex situations where two individuals simultaneously walk through the sensor’s detection area. As illustrated in Figure 9, conventional rule-based algorithms using clustering techniques struggle to resolve such multi-object conditions. While sequential entries can still be separated and learned individually by neural networks, simultaneous entries present additional challenges that require further training. To improve the robustness of the model in these complex conditions, we introduce additional training data for the Challenge Scenarios, with two labeling categories—“rightward pair” and “leftward pair”—representing simultaneous two-person movement in each direction.

In total, the dataset comprises 450 samples from Basic Scenarios and 200 samples from Challenge Scenarios. These data are used for training the proposed LSTM-based neural network model.

3.4. Neural Network Training Parameters

To analyze the preprocessed time-series data

H_{t}

obtained from the ToF sensor and infer human movement, we designed the LSTM-based neural network model using the TensorFlow Keras library.

H_{t} = [h_{t, 1}, h_{t, 2}, h_{t, 3}, \cdot \cdot \cdot, h_{t, 64}] .

(8)

Since the total number of time steps

T_{t o t a l}

in the time-series data collected from the ToF sensor may vary across samples, it is necessary to unify the sequence lengths to construct a single input tensor for the LSTM model. This is accomplished by padding shorter sequences with zero values at the end. However, since these padded zeros are not part of the actual data, they must be excluded during the training process. Additionally, as the ToF sensor captures frames even before and after the person enters or leaves the FoV, it is crucial to filter out irrelevant frames that do not contain meaningful movement.

Although a dedicated detection module could be developed to identify when the person enters or exits the sensor’s range, this study aims to shift the entire perception and inference process to the neural network, avoiding rule-based modules. To achieve this, we implement a custom masking layer that enables the model to ignore time steps filled with zeros, whether due to padding or non-movement periods before or after the person appears. As a result, the LSTM network processes only meaningful time steps, effectively skipping those where all feature values are zero. The masked time-series data, excluding irrelevant time steps, are expressed as follows:

α = \min \{t |\sum_{d = 1}^{64} h_{t, d} \neq 0\},

(9)

X_{t} = H_{α + t - 1} \forall t \in [1, T] .

(10)

Here,

α

represents the time index at which meaningful movement information first appears in the sequence. The effective input to the network,

X_{t}

, excludes all masked time steps and contains only the meaningful portions of the original data. Since each sample may have a different number of meaningful time steps, the time length

T

used in training also varies across the dataset. This masking technique enhances model performance by ensuring the network focuses on actual motion-related data and avoids learning from irrelevant information, thereby reducing potential training errors.

The LSTM-based neural network architecture used in this study is illustrated in Figure 10. It consists of two stacked LSTM layers to capture deeper temporal patterns. The first LSTM layer has 64 output units and produces output at all time steps, allowing the second LSTM layer to learn comprehensive sequential features. The second LSTM layer has 32 output units and compresses the sequence into a single vector using the masked final time step. A fully connected layer with 16 units and the rectified linear unit (ReLU) activation function is then applied to introduce non-linearity. Finally, a SoftMax-activated dense layer with five output units is used to classify the input sequence into one of five movement patterns.

The model is trained using the Adam optimizer to optimize weights, and the categorical cross-entropy loss function, commonly used in multi-class classification problems, is employed:

L (y, \hat{y}) = - \sum_{i = 1}^{N} y_{i} \log ({\hat{y}}_{i}) .

(11)

Here,

N

is the number of classes;

y_{i}

is the one-hot encoded true label; and

{\hat{y}}_{i}

is the predicted probability for each class. The model is trained with a batch size of 16 and for 20 epochs, using an 80:20 split between the training and validation datasets.

4. Results

To evaluate the performance of the proposed LSTM-based model for detecting human entry and exit, we analyzed both training and testing outcomes. The training dataset consisted of 330 Basic Scenario samples and 140 Challenge Scenario samples. In addition, we used 180 test samples to evaluate the generalization ability of the model. As shown in Table 1, the training samples were categorized into datasets for training and datasets for validation.

4.1. Neural Network Training

The training accuracy and loss over epochs are shown in Figure 11. The model’s training accuracy converged to 1.0 after nearly the fifth epoch, while the loss steadily decreased, indicating that the model was trained stably and efficiently. This rapid convergence can be attributed to the characteristics of the low-resolution ToF sensor data, which enables the model to effectively learn patterns even with a relatively small number of epochs.

The proposed 8 × 8 array-based low-resolution ToF sensor is designed to convey motion-related information rather than detailed physical characteristics, such as height or body shape. In contrast, models based on high-resolution cameras or ToF sensors often exhibit variations in performance depending on the physical characteristics of individuals. However, the low-resolution sensor used in this study shows less sensitivity to such variations, allowing the model to maintain consistent learning performance across a variety of user conditions. This robustness highlights the model’s efficiency and suitability for deployment in diverse environments.

4.2. Test Results

To evaluate the test accuracy of the proposed LSTM-based neural network model, the test dataset consists of 40 samples for each label in the baseline scenario and 30 samples for each label in the challenge scenario. The model is assessed based on its ability to correctly predict the labels of five movement categories: rightward, leftward, turn, rightward pair and leftward pair. A prediction is considered successful if the model’s output matches the ground truth label. This evaluation approach aligns with the characteristics of LSTM-based neural networks, which are designed to learn temporal dependencies across entire sequences. Rather than focusing on frame-level decisions, the model captures motion patterns as they unfold over time, allowing it to infer movement direction from the sequential flow of sensor data. This perspective emphasizes behavior recognition at the sequence level, which naturally reflects the operational principle of time-series learning models.

Table 2 presents the evaluation results using the test dataset. For each label, we recorded the number of successes on the test dataset and calculated the success rate for each trial and the average success rate for all trials. The model achieved an average accuracy of over 98% across five test runs. Notably, the model demonstrated excellent classification performance not only for single-user movements but also in more complex multi-user scenarios where two individuals moved simultaneously. These results validate the effectiveness of the proposed LSTM-based neural network in learning human movement patterns using low-resolution ToF sensor data. While traditional rule-based algorithms often suffer from reduced accuracy in multi-user environments due to the limitations of clustering techniques, our model successfully leverages temporal sequence learning to maintain robust and generalized performance even under variable and dynamic conditions.

5. Discussion

This study proposed a method for estimating human movement trajectories in indoor environments by combining the low-resolution ToF sensor with the LSTM-based neural network model. Traditional approaches using camera-based sensors often face limitations in indoor applications due to privacy concerns, while the use of high-resolution ToF sensors introduces significant cost-related challenges. To address these issues, our approach leverages an 8 × 8 low-resolution ToF sensor to collect data and employs an LSTM model to learn movement patterns, thereby providing a high-performance yet cost-effective solution.

The experimental results demonstrated that the proposed model achieved 98% accuracy on the test dataset. By using the neural network, the model effectively replaces traditional, complex rule-based algorithms and successfully learns and predicts movement trajectories. Moreover, the model was shown to maintain robust performance even in multi-user environments, highlighting its practical applicability in real-world scenarios. These findings suggest that the proposed system holds significant potential for a variety of applications, including smart building automation, indoor occupancy monitoring in public facilities and people-counting systems in privacy-sensitive areas. In particular, it offers a promising alternative to camera-based systems in environments such as hospitals, restrooms and meeting rooms.

Future work will investigate the model’s generalization capability in more diverse indoor environments, including spatial structures commonly found in offices, hallways and public facility entrances. Additional experiments will be conducted under conditions with partial occlusions or moving obstacles to assess robustness in more realistic scenarios. Furthermore, the system’s applicability will be examined when the sensor is installed above actual entryways, enabling a more direct evaluation of performance in real-world deployment settings. These efforts will aim to enhance the domain adaptability and practical usability of the proposed method.

Author Contributions

Conceptualization, S.O., K.M.L. and N.K.K.; Methodology, S.O. and K.M.L.; Software, S.O.; Validation, K.M.L., S.Y.L. and N.K.K.; Formal Analysis, S.O.; Investigation, K.M.L.; Resources, S.Y.L. and N.K.K.; Data Curation, S.O. and K.M.L.; Writing—Original Draft Preparation, S.O., K.M.L., S.Y.L. and N.K.K.; Writing—Review and Editing, S.O., S.Y.L. and N.K.K.; Visualization, S.O. and K.M.L.; Supervision, N.K.K.; Project Administration, S.Y.L. and N.K.K.; Funding Acquisition, N.K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the government of Korea (MSIT) (No. RS-2023-00219725) and in part by Yeungnam University through a research grant allocated in 2021.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to internal restrictions on data dissemination.

Acknowledgments

The authors express their sincere appreciation to all those who contributed to this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aazami, R.; Moradi, M.; Shirkhani, M.; Harrison, A.; Al-Gahtani, S.F.; Elbarbary, Z. Technical Analysis of Comfort and Energy Consumption in Smart Buildings with Three Levels of Automation: Scheduling, Smart Sensors, and IoT. IEEE Access 2025, 13, 8310. [Google Scholar] [CrossRef]
Poyyamozhi, M.; Murugesan, B.; Rajamanickam, N.; Shorfuzzaman, M.; Aboelmagd, Y. IoT—A Promising Solution to Energy Management in Smart Buildings: A Systematic Review, Applications, Barriers, and Future Scope. Buildings 2024, 14, 3446. [Google Scholar] [CrossRef]
Choi, W.; Kang, I.; Kim, C. A Study on Energy Saving and Safety Improvement through IoT Sensor Monitoring in Smart Factory. J. Soc. Disaster Inf. 2024, 20, 117–127. [Google Scholar]
Singh, S.; Aggarwal, N.; Dabas, D. Empowering homes through energy efficiency: A comprehensive review of smart home systems and devices. Int. J. Energy Sect. Manag. 2024. ahead-of-print. [Google Scholar] [CrossRef]
Mena-Martinez, A.; Alvarado-Uribe, J.; Molino-Minero-Re, E.; Ceballos, H.G. Indoor occupancy monitoring using environmental feature fusion and semi-supervised machine learning models. J. Build. Perform. Simul. 2024, 17, 695–717. [Google Scholar] [CrossRef]
Tsang, T.-W.; Mui, K.-W.; Wong, L.-T.; Chan, A.C.-Y.; Chan, R.C.-W. Real-Time Indoor Environmental Quality (IEQ) Monitoring Using an IoT-Based Wireless Sensing Network. Sensors 2024, 24, 6850. [Google Scholar] [CrossRef]
Zhong, C.; Sun, J.; Xie, J.; Grijalva, S.; Meliopoulos, A.S. Real-time human activity-based energy management system using model predictive control. In Proceedings of the 2018 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 12–14 January 2018; pp. 1–6. [Google Scholar]
Shahbazian, R.; Trubitsyna, I. Human sensing by using radio frequency signals: A survey on occupancy and activity detection. IEEE Access 2023, 11, 40878–40904. [Google Scholar] [CrossRef]
Allik, A.; Muiste, S.; Pihlap, H. Movement Based Energy Management Models for Smart Buildings. In Proceedings of the 2019 7th International Conference on Smart Grid (icSmartGrid), Newcastle, Australia, 9–11 December 2019; pp. 87–91. [Google Scholar]
Saputro, A.H.; Imawan, C. Local and global human activity detection for room energy saving model. In Proceedings of the 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Malang, Indonesia, 15–16 October 2016; pp. 413–416. [Google Scholar]
Li, T.; Liu, X.; Li, G.; Wang, X.; Ma, J.; Xu, C.; Mao, Q. A systematic review and comprehensive analysis of building occupancy prediction. Renew. Sustain. Energy Rev. 2024, 193, 114284. [Google Scholar] [CrossRef]
Sekiguchi, T.; Kato, H. Privacy Assuring Video-Based Monitoring System Considering Browsing Purposes. In Proceedings of the 2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops), Trento, Italy, 31 January–4 February 2005; pp. 464–467. [Google Scholar]
Hasan, M.R.; Guest, R.; Deravi, F. Presentation-level privacy protection techniques for automated face recognition—A survey. ACM Comput. Surv. 2023, 55, 286. [Google Scholar] [CrossRef]
Klauser, D.; Bärwolff, G.; Schwandt, H. A TOF-based automatic passenger counting approach in public transportation systems. In AIP Conference Proceedings; American Institute of Physics: Washington, DC, USA, 2015; Volume 1648. [Google Scholar]
Diraco, G.; Leone, A.; Siciliano, P. People occupancy detection and profiling with 3D depth sensors for building energy management. Energy Build. 2015, 92, 246–266. [Google Scholar] [CrossRef]
Li, F.; Willomitzer, F.; Balaji, M.M.; Rangarajan, P.; Cossairt, O. Exploiting wavelength diversity for high resolution time-of-flight 3D imaging. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2193–2205. [Google Scholar] [CrossRef] [PubMed]
Lu, H.; Tuzikas, A.; Radke, R.J. A zone-level occupancy counting system for commercial office spaces using low-resolution time-of-flight sensors. Energy Build. 2021, 252, 111390. [Google Scholar] [CrossRef]
Zappi, P.; Farella, E.; Benini, L. Enhancing the spatial resolution of presence detection in a PIR based wireless surveillance network. In Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, London, UK, 5–7 September 2007; pp. 295–300. [Google Scholar]
Wahl, F.; Milenkovic, M.; Amft, O. A distributed PIR-based approach for estimating people count in office environments. In Proceedings of the 2012 IEEE 15th International Conference on Computational Science and Engineering, Paphos, Cyprus, 5–7 December 2012; pp. 640–647. [Google Scholar]
Sun, K.; Zhao, Q.; Zou, J. A review of building occupancy measurement systems. Energy Build. 2020, 216, 109965. [Google Scholar] [CrossRef]
Chandran, A.K.; Poh, L.A.; Vadakkepat, P. Real-time identification of pedestrian meeting and split events from surveillance videos using motion similarity and its applications. J. Real-Time Image Process. 2019, 16, 971–987. [Google Scholar] [CrossRef]
Tien, P.W.; Wei, S.; Calautit, J.K.; Darkwa, J.; Wood, C. A vision-based deep learning approach for the detection and prediction of occupancy heat emissions for demand-driven control solutions. Energy Build. 2020, 226, 110386. [Google Scholar] [CrossRef]
Zhang, W.; Calautit, J.; Tien, P.W.; Wu, Y.; Wei, S. Deep learning models for vision-based occupancy detection in high occupancy buildings. J. Build. Eng. 2024, 98, 111355. [Google Scholar] [CrossRef]
Newton, E.M.; Sweeney, L.; Malin, B. Preserving privacy by de-identifying face images. IEEE Trans. Knowl. Data Eng. 2005, 17, 232–243. [Google Scholar] [CrossRef]
Dai, J.; Wu, J.; Saghafi, B.; Konrad, J.; Ishwar, P. Towards privacy-preserving activity recognition using extremely low temporal and spatial resolution cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 68–76. [Google Scholar]
Ahmad, S.; Morerio, P.; Del Bue, A. Person re-identification without identification via event anonymization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 11132–11141. [Google Scholar]
Yang, Y.; Cao, J.; Liu, X.; Liu, X. Door-monitor: Counting in-and-out visitors with COTS WiFi devices. IEEE Internet Things J. 2019, 7, 1704–1717. [Google Scholar] [CrossRef]
Cokbas, M.; Ishwar, P.; Konrad, J. Low-resolution overhead thermal tripwire for occupancy estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 88–89. [Google Scholar]
He, Y.; Zhang, H.; Arens, E.; Merritt, A.; Huizenga, C.; Levinson, R.; Wang, A.; Ghahramani, A.; Alvarez-Suarez, A. Smart detection of indoor occupant thermal state via infrared thermography, computer vision, and machine learning. Build. Environ. 2023, 228, 109811. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]

Figure 1. RNN architecture: (a) RNN cell internal structure; (b) RNN flow to process sequence data.

Figure 2. ToF sensor model and visualization of field of view (FoV) performance and acquisition of data: (a) ToF sensor model VL53L8CX; (b) ToF sensor evaluation board P-NUCLEO-53L8A1; (c) Illustration of FoV; (d) Data plot obtained from ToF sensor.

Figure 3. Description of the relationship between the data measured by the sensor and the data to be used in the algorithm.

Figure 4. Transmission process of distance measurement frames using the ToF sensor.

Figure 5. Flowchart of data preprocessing for selection of good frames.

Figure 6. Sensor and installation.

Figure 7. Basic Scenarios: (a) Walking alone; (b) Walking with a box.

Figure 8. ToF sensor data: (a) Leftward walking alone; (b) Leftward walking with a box.

Figure 9. ToF sensor data (two people walking in the rightward direction).

Figure 10. LSTM-based neural network model architecture.

Figure 11. Training accuracy and loss graph.

Table 1. Dataset label distribution.

Label	Train	Validation	Test
Rightward	100	10	40
Leftward	100	10	40
Turn	100	10	40
Rightward pair	65	5	30
Leftward pair	65	5	30

Table 2. Test results using the trained model.

Trial	Leftward	Rightward	Turn	Rightward Pair	Leftward Pair	Success Rate
1	40/40	40/40	39/40	30/30	30/30	99.44%
2	39/40	39/40	40/40	30/30	30/30	98.89%
3	39/40	40/40	36/40	29/30	30/30	96.67%
4	40/40	40/40	38/40	28/30	30/30	97.78%
5	39/40	39/40	38/40	30/30	30/30	97.78%
Total	197/200	198/200	191/200	147/150	150/150	98.11%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Oh, S.; Lee, K.M.; Lee, S.Y.; Kwon, N.K. Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network. J. Sens. Actuator Netw. 2025, 14, 61. https://doi.org/10.3390/jsan14030061

AMA Style

Oh S, Lee KM, Lee SY, Kwon NK. Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network. Journal of Sensor and Actuator Networks. 2025; 14(3):61. https://doi.org/10.3390/jsan14030061

Chicago/Turabian Style

Oh, Sejik, Kyoung Min Lee, Seok Young Lee, and Nam Kyu Kwon. 2025. "Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network" Journal of Sensor and Actuator Networks 14, no. 3: 61. https://doi.org/10.3390/jsan14030061

APA Style

Oh, S., Lee, K. M., Lee, S. Y., & Kwon, N. K. (2025). Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network. Journal of Sensor and Actuator Networks, 14(3), 61. https://doi.org/10.3390/jsan14030061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Movement Direction Classification Using Low-Resolution ToF Sensor and LSTM-Based Neural Network

Abstract

1. Introduction

2. Background

2.1. Research on People Counting

2.2. Overview of Recurrent Neural Network and LSTM

3. Materials and Methods

3.1. ToF Sensor

3.2. Data Preprocessing

3.3. Dataset Collection

3.4. Neural Network Training Parameters

4. Results

4.1. Neural Network Training

4.2. Test Results

5. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI