Acoustic-Based Position Estimation of an Object and a Person Using Active Localization and Sound Field Analysis

: This paper proposes a new method to estimate the position of an object and a silent person with a home security system using a loudspeaker and an array of microphones. The conventional acoustic-based security systems have been developed to detect intruders and estimate the direction of intruders who generate noise. However, there is a need for a method to estimate the distance and angular position of a silent intruder for interoperation with the conventional security sensors, thus overcoming the disadvantage of acoustic-based home security systems, which operate only when sound is generated. Therefore, an active localization method is proposed to estimate the direction and distance of a silent person by actively detecting the sound ﬁeld variation measured by the microphone array after playing the sound source in the control zone. To implement the idea of the proposed method, two main aspects were studied. Firstly, a signal processing method that estimates the position of a person by the reﬂected sound, and secondly, the environment in which the proposed method can be operated through a ﬁnite-di ﬀ erence time-domain (FDTD) simulation and the acoustic parameters of early decay time (EDT) and reverberation time (RT20). Consequently, we veriﬁed that with the proposed method it is possible to estimate the position of a polyvinyl chloride (PVC) pipe and a person by using their reﬂection in a classroom.


Introduction
With the rapid development of smart homes and voice-assistant technologies, home environments have been established in which loudspeakers and microphones are deployed as sensors or are built-in and distributed through home appliances. The aim of this research was to develop an acoustic-based home security system in the aforementioned environment. An example is shown in Figure 1.
Smart home technology has been evolving to provide proactive services through the monitoring of residents. Therefore, accurately recognizing the scenario in a home environment through a combination of various sensors is important. In [1], studies on context awareness for indoor activity recognition using binary sensors, cameras, radio-frequency identification, and air pressure sensors were reviewed.
A study proposed to recognize each living activity of a user by combining the power meters of appliances with an ultrasonic sensor [2]. In [3], a study was conducted to recognize the complex activities of a kitchen using one module with various sensors. A study proposed to recognize each living activity of a user by combining the power meters of appliances with an ultrasonic sensor [2]. In [3], a study was conducted to recognize the complex activities of a kitchen using one module with various sensors.
In such a smart home environment, microphones are used for context awareness and health monitoring owing to their advantage of operating with low power [4]. Dahmen et al. explained that a microphone can be used to identify the scenario of a home environment based on unusual loud noise and the sound of a human falling [5]. In addition, a study explored the possibility of personal identification through footsteps [6].
Automated home security systems have been developed using smart home technologies. Recent home security systems protect residents and their properties, making them safe from intruders as conventional security systems, and they enable the detection of risks to the residents in advance through context awareness of the home environment [5,7].
Microphones in a home security system are primarily used for two purposes: event detection and the classification of unusual sounds, and intrusion detection.
In [8], related studies were reviewed through a comprehensive survey of background surveillance, event classification, object tracking, and situation analysis, and the detection of events in a highly noisy environment was proposed [9]. In [10][11][12], a microphone array and security camera were combined to detect the sound from an intruder and tilt the security camera in the direction of the sound. Research has been conducted to predict the state of a control space by recognizing the type of sound, analyzing and classifying the sound, and estimating the angular position of the unusual sound using a microphone array [13,14]. A method to identify human behavior in a control space by applying a microphone array to a sound-steered camera was proposed [15].
Intrusion detection using microphones is as effective as the use of security cameras in terms of detecting moving objects [4], and the related studies are summarized below. Studies on intrusion detection have been conducted to determine an intrusion in a security zone based on the change in the room transfer function [16], the sound field variation according to the acoustical transmission path of distributed microphones [17], and the coherence responses in low-frequency environments [18].
However, the conventional methods for event detection have the disadvantage of operating only when a loud noise is generated because the position is determined in the direction of the generated sound, and current techniques for intrusion detection have the disadvantage of only detecting intrusion but not providing the location.
To overcome these disadvantages, we propose an acoustic-based active localization and analysis method to estimate a silent intruder. This study provides a link between localization and intrusion detection techniques using an acoustic-based security system. The reason is that if a person's position can be estimated and tracked using microphones and loudspeakers, the entry of an unauthorized In such a smart home environment, microphones are used for context awareness and health monitoring owing to their advantage of operating with low power [4]. Dahmen et al. explained that a microphone can be used to identify the scenario of a home environment based on unusual loud noise and the sound of a human falling [5]. In addition, a study explored the possibility of personal identification through footsteps [6].
Automated home security systems have been developed using smart home technologies. Recent home security systems protect residents and their properties, making them safe from intruders as conventional security systems, and they enable the detection of risks to the residents in advance through context awareness of the home environment [5,7].
Microphones in a home security system are primarily used for two purposes: event detection and the classification of unusual sounds, and intrusion detection.
In [8], related studies were reviewed through a comprehensive survey of background surveillance, event classification, object tracking, and situation analysis, and the detection of events in a highly noisy environment was proposed [9]. In [10][11][12], a microphone array and security camera were combined to detect the sound from an intruder and tilt the security camera in the direction of the sound. Research has been conducted to predict the state of a control space by recognizing the type of sound, analyzing and classifying the sound, and estimating the angular position of the unusual sound using a microphone array [13,14]. A method to identify human behavior in a control space by applying a microphone array to a sound-steered camera was proposed [15].
Intrusion detection using microphones is as effective as the use of security cameras in terms of detecting moving objects [4], and the related studies are summarized below. Studies on intrusion detection have been conducted to determine an intrusion in a security zone based on the change in the room transfer function [16], the sound field variation according to the acoustical transmission path of distributed microphones [17], and the coherence responses in low-frequency environments [18].
However, the conventional methods for event detection have the disadvantage of operating only when a loud noise is generated because the position is determined in the direction of the generated sound, and current techniques for intrusion detection have the disadvantage of only detecting intrusion but not providing the location.
To overcome these disadvantages, we propose an acoustic-based active localization and analysis method to estimate a silent intruder. This study provides a link between localization and intrusion detection techniques using an acoustic-based security system. The reason is that if a person's position can be estimated and tracked using microphones and loudspeakers, the entry of an unauthorized person into the security space can be known. However, this study primarily addressed the estimation of the position of a silent intruder.
The process of a home security system can be divided into sensing, assessing, and responding. Sensing is very important because it functions as a trigger to operate the security system. Thus, the sensors must be interoperable with each other [5], with a combination of various individual sensors [5,7], or with the information measured by one sensor module [19].
Therefore, through this study, we expected to increase the utilization of microphones used in home security systems. This is because the data measured by a conventional linear microphone array provide only angular information. However, the proposed method also provides the distance, which increases the number of scenarios that can be combined with the information of other sensors.
We present two examples of complementary sensing. In the first one, passive infrared (PIR) sensors function as triggers to awake the security system and record the intrusion using a camera [20]. However, PIR sensors have the disadvantage of being unable to detect an intruder who does not move, moves slowly, or uses heat-insulating clothes. IR sensors have limitations that often cause errors because of their nonlinear sensitivity and the effects of nearby objects [21]. Therefore, if the acoustic-based intrusion detection in [16][17][18] is applied to the security system to compensate for the weakness in IR sensors, the two sensing systems can complement each other to increase the robustness of the intrusion detection. In the second example, when a microphone array detects the direction of an event, a pan-tilt-zoom (PTZ) camera is rotated and focused on the region of interest [8]. However, because the camera has misrecognitions owing to poor resolutions, distant targets, changes in illuminations, or occlusions [22], the PTZ camera can be operated robustly by providing a distance and an angle of the intruder based on the proposed method.
Therefore, to overcome the shortcomings of conventional acoustic-based intrusion detection systems and achieve the complementary intrusion detection system proposed in [5], this paper describes our proposed active localization method that estimates the position (distance and angle) of a silent intruder using a generated reflection. The main concept is that a loudspeaker generates a signal in the security space. The microphone array extracts the changed signals owing to the intruder, and then the distance and direction are estimated using the changed signals (sound field variation).
Echolocation is a technology that detects a location through an echo which is emitted from a sound source and then returns, and it has been primarily implemented using an ultrasonic sensor. In [23], a biomimetic sonar system was mounted on a robot arm to recognize an object through the vector of the echo envelope. A biomimetic study was conducted to estimate the distance and angle [24]. The distance was estimated using the time delay between the maximum activity owing to the call and the activity owing to the echo, and the angle was predicted by comparing the directivity pattern of the sensor using the notch pattern in the frequency range. Ultrasonic sensors are acoustic sensors used in conventional home security systems. Ultrasonic sensors are active sensors that send signals in a straight line; therefore, the source and receiver can be placed face-to-face [21] or in the same direction to physically detect the intruder [25]. However, owing to the straightness of the signal, they have the disadvantages of utilizing several sensors to increase the detection rate [26] and being unable to detect a person that passes behind an obstacle.
The proposed active localization in the audible frequency utilizes the phenomenon of scattering rather than straightness. Through fundamental research, we verified that the scattering phenomenon in the audible frequency can be used to detect an object [27] or a person hiding behind an obstacle (the related results are described in Appendix B).
We expect that the combination of ultrasound with its straightness and audible sound with strong scattering can detect a person better. Thus, to create a function as a sensor using a loudspeaker and a microphone array, we studied which room conditions result in the reflection generated by an intruder being considered as a new sound source.
We introduce two main topics to implement the proposed idea. The first aspect is signal processing to estimate the position using reflection, and the second involves the simulation and analysis method of the sound field to estimate the position through the reflected sound in the reverberation space. Thus, analysis equations using acoustic parameters are proposed.
When estimating the position of a person using an active acoustic-based method, the analysis of the sound field to determine the position of the intruder has the following implications. In a reverberant environment, the proposed method is not aimed at estimating the position by increasing the number of microphones. In other words, this does not mean that many microphones are distributed in the control space or that the microphone arrays are arranged at each corner of the control space. By using limited hardware, one loudspeaker and one microphone array, the method of estimating a person's position using the reflected sound is possible through sound field analysis. Therefore, the active localization method proposed in this paper was verified by estimating the position of a polyvinyl chloride (PVC) pipe and a person in a classroom using signal processing and sound field analysis.
The remainder of this paper is organized as follows. In Section 2, the signal model for position estimation using the reflection sound is presented; subsequently, the algorithm is proposed. The feasibility results of the proposed method are presented through the testing of an anechoic chamber. In Section 3, the simulation results for a reverberant environment are described, and the operating conditions in the reverberant space are proposed based on acoustic parameters. In Section 4, the examination of the proposed method using a PVC pipe and a person in a classroom is described. Finally, the conclusions are presented in Section 5.

Signal Model and Definition of Sound Field Variation
The implementation of active localization to estimate the position of a silent intruder requires a reflected sound generated by a silent intruder. We define the sound field variation as the difference between the sound field before intrusion and the sound field after intrusion.
Therefore, the proposed active localization based on the sound field variation can be tested using two steps. The first step is to measure the sound field in a targeted security space using an active approach with a loudspeaker and a microphone array. The second step is to obtain the position of the silent intruder by acquiring the signals of the sound field variation based on a comparison between the signal of the sound field before intrusion (the reference sound field) and after intrusion (the event sound field). Figure 2 shows the scheme of sound field variation and, as an example, shows some of the reflections. Because the proposed active localization method uses the time signals from a direct sound to the early reflections and we assume that the silent intruder affects the specific reflection locally, we define the decomposition of room impulse responses as in Equations (1) and (2). When estimating the position of a person using an active acoustic-based method, the analysis of the sound field to determine the position of the intruder has the following implications. In a reverberant environment, the proposed method is not aimed at estimating the position by increasing the number of microphones. In other words, this does not mean that many microphones are distributed in the control space or that the microphone arrays are arranged at each corner of the control space. By using limited hardware, one loudspeaker and one microphone array, the method of estimating a person's position using the reflected sound is possible through sound field analysis. Therefore, the active localization method proposed in this paper was verified by estimating the position of a polyvinyl chloride (PVC) pipe and a person in a classroom using signal processing and sound field analysis.
The remainder of this paper is organized as follows. In Section 2, the signal model for position estimation using the reflection sound is presented; subsequently, the algorithm is proposed. The feasibility results of the proposed method are presented through the testing of an anechoic chamber. In Section 3, the simulation results for a reverberant environment are described, and the operating conditions in the reverberant space are proposed based on acoustic parameters. In Section 4, the examination of the proposed method using a PVC pipe and a person in a classroom is described. Finally, the conclusions are presented in Section 5.

Signal Model and Definition of Sound Field Variation
The implementation of active localization to estimate the position of a silent intruder requires a reflected sound generated by a silent intruder. We define the sound field variation as the difference between the sound field before intrusion and the sound field after intrusion.
Therefore, the proposed active localization based on the sound field variation can be tested using two steps. The first step is to measure the sound field in a targeted security space using an active approach with a loudspeaker and a microphone array. The second step is to obtain the position of the silent intruder by acquiring the signals of the sound field variation based on a comparison between the signal of the sound field before intrusion (the reference sound field) and after intrusion (the event sound field). Figure 2 shows the scheme of sound field variation and, as an example, shows some of the reflections. Because the proposed active localization method uses the time signals from a direct sound to the early reflections and we assume that the silent intruder affects the specific reflection locally, we define the decomposition of room impulse responses as in Equations (1) h event = h s + α 1 h r1 + · · · + α n h r n + h person + h reverberation (2) where h ref is the RIR of a reference scenario, h event is the RIR of an event scenario, h r n represents the early reflections of each scenario, h person is the new response generated by a person, h reverberation is the late reverberation of the room impulse response, and α n represents the attenuation coefficients. Methods to estimate the room shape or locate a sound source by analyzing the echo components of the RIR have been proposed [28][29][30]. However, because these methods are performed assuming that the RIR is known, the problem of measuring RIR every time an intruder moves in a scenario exists, and they have the disadvantage of being slow systems.
Therefore, in this study, the signal modeling was represented by the viewpoint of the echo decomposition of the RIR, but the signal generated by the loudspeaker was determined using the Gaussian-modulated sinusoidal pulse in Equation (10) to analyze the changed sound field before and after the intrusion, and the extraction of the changed echo component was performed using Equation (3).
If the silent intruder affects the reflection h r n of the RIR locally, the sum of early reflections in an event scenario is approximate to the sum of early reflections in a reference scenario, i.e., α 1 ≈ α 2 ≈ · · · ≈ α n ≈ 1. Therefore, Equation (2) can be rewritten as Equation (3).
The sound field variation can be calculated using Equation (4).
where H ref m is the transfer function of the control area under the reference scenario shown in Figure 2a, H event m is the transfer function under the event scenario shown in Figure 2b, X is the input signal, G m represents the signals measured by the microphone array after an intrusion, Y m represents the reference signals before an intrusion, R effect m represents the changed spatial effects, and m is a microphone index.
The spatial effects R effect m are assumed to include the sound signals emitted as reflections by the silent intruder. In other words, R effect m can be assumed to also consider the new sound source. This is because the intruder changes the sound field formed from the sound source of a loudspeaker, and then the intruder's position is estimated using the measured R effect m at a microphone array. This is the same concept in which the incident, reflected, and transmitted phenomena of pressure distribution on the flat surface of a discontinuity are considered to be the sum of the blocked pressure and the radiation pressure in [31]. If the blocked pressure is the signal of the reference scenario in the control space and the radiation pressure is the signal of the event scenario, we can consider it as a new sound source because only the radiation signal remains when the reference scenario signal is removed from the measured signal. From this concept, the loudspeaker is the sound source that generates the sound field in a control area, whereas in the proposed approach, the sound wave formed by the intruder is a new source and the location of the silent intruder can be detected.

Proposed Algorithm Based on Steered Response Power with Moving Average
In this section, the approach of an algorithm using the steered response power (SRP) is addressed. SRP is a sound source localization technique, and it is known as a robust localization technique in reverberant environments [32,33].
Appl. Sci. 2020, 10, 9090 6 of 26 θ s = argmax θ P k (θ) (6) where P k (θ) is the power value of the classical SRP, θ is the steered angle,θ s is the look direction, s m is the microphone signal,τ m is the delay of each microphone, W m is the weight, M is the number of microphones, m is the microphone index, k is the block index, and T is the length of some finite-length block signals.
Equations (5) and (6) are the classical SRP using a microphone array, Equation (5) indicates the integrated output of the steered beamformer, and Equation (6) indicates the direction of the sound source.
The proposed active localization estimates the position of a silent person as an angle and a distance in the horizontal plane of a linear microphone array ( Figure 3). In other words, the proposed algorithm should represent a two-dimensional plane. In [34,35], the generalized cross-correlation-phase transform (GCC-PHAT) was used to represent the spatial energy map. However, since the PHAT method revealed that the sound source can be determined well under low noise [36], the localization performance in the two dimensions is not robust. The proposed algorithm uses the reflection to estimate the position; thus, the signal-to-noise ratio (SNR) is not high. Therefore, the energy map is expressed by applying the delay and sum beamformer to the classical SRP and a moving average to the power of the steered block signal. Accordingly, Equations (5) and (6) are modified as Equations (7) and (9) to represent the energy map on the horizontal plane of the linear microphone array.
where P(t, θ d ) is the energy map of the SRP, θ d is the set of desired angles, N L is the length of the moving average, P t s ,θ s denotes the position results,t s is the index of the reflected time sample, r s is the estimated distance between the maximum point and the origin, t ref is the index of the peak point generated signal (the origin),θ s is the estimated angle, c is the speed of sound, and f s is the sampling frequency.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 26 where ( ) k P θ is the power value of the classical SRP, θ is the steered angle, ˆs θ is the look direction, m s is the microphone signal, ˆm τ is the delay of each microphone, Wm is the weight, M is the number of microphones, m is the microphone index, k is the block index, and T is the length of some finite-length block signals.
Equations (5) and (6) are the classical SRP using a microphone array, Equation (5) indicates the integrated output of the steered beamformer, and Equation (6) indicates the direction of the sound source.
The proposed active localization estimates the position of a silent person as an angle and a distance in the horizontal plane of a linear microphone array ( Figure 3). In other words, the proposed algorithm should represent a two-dimensional plane. In [34,35], the generalized cross-correlationphase transform (GCC-PHAT) was used to represent the spatial energy map. However, since the PHAT method revealed that the sound source can be determined well under low noise [36], the localization performance in the two dimensions is not robust. The proposed algorithm uses the reflection to estimate the position; thus, the signal-to-noise ratio (SNR) is not high. Therefore, the energy map is expressed by applying the delay and sum beamformer to the classical SRP and a moving average to the power of the steered block signal. Accordingly, Equations (5) and (6) are modified as Equations (7) and (9) to represent the energy map on the horizontal plane of the linear microphone array.
where ( ) , d P t θ is the energy map of the SRP, θd is the set of desired angles, NL is the length of the  Figure 4 shows the measured signals of the A position in the experimental configuration when the boundary absorption coefficient of a room is equal to 0.625. In Equation (7), the length of the block signals (T) is set as the maximum distance that the signal can reciprocate in the target room. The estimated distance of an intruder is calculated using Equation (9) through the time information corresponding to the peak of the sound field variation. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 26 In this study, the input value of the SRP used the changed signal between the reference signal and the measured signal. In other words, the impulse response in Equation (3) was not directly predicted, but the sound field variation in the same reproduction signal was estimated by subtracting the reference signal from the measured microphone signal. We used a triangular moving average of 36 samples in the 48 kHz sampling rate, and the estimated distance was calculated as the product of time and sound speed. This averaging method empirically reduced the error variance of the estimated angle and distance in the proposed active localization. Figure 5 shows the block diagram used to implement the proposed method using the sound field variation and the SRP with a moving average. Figure 5a shows the steps to synchronize the measured signals. Figure 5b shows that the measured signals are stored as the reference signals if no event is detected, as depicted in Figure 5c, and Figure 5d indicates the proposed SRP to estimate the position of a silent person. Step for reference signals defined as measured signals if no event is detected; (c) Step for event detection; (d) Step for SRP using Equations (7)-(9).
In the signal synchronization step in Figure 5a, we set up the block diagram to minimize the time delay between the reference signal and event signal for each microphone. Thus, two steps were involved. The first was to reduce the quantization error by setting the clocks of the loudspeaker and microphone board identically in hardware. The second step, after measurement, was to verify and compensate for the time delay between ref t of Equation (7) and the peak of the generated signal based on correlation. The event detection in Figure 5c was used to determine intrusion by selecting the threshold of sound field variation in [17]. In this study, we focused on the analysis of the SRP results in Figure 5d. In other words, we aimed to analyze the relationship between the variables (reverberation time and early decay time) in the control space and the signal processing results. In this study, the input value of the SRP used the changed signal between the reference signal and the measured signal. In other words, the impulse response in Equation (3) was not directly predicted, but the sound field variation in the same reproduction signal was estimated by subtracting the reference signal from the measured microphone signal. We used a triangular moving average of 36 samples in the 48 kHz sampling rate, and the estimated distance was calculated as the product of time and sound speed. This averaging method empirically reduced the error variance of the estimated angle and distance in the proposed active localization. Figure 5 shows the block diagram used to implement the proposed method using the sound field variation and the SRP with a moving average. Figure 5a shows the steps to synchronize the measured signals. Figure 5b shows that the measured signals are stored as the reference signals if no event is detected, as depicted in Figure   In this study, the input value of the SRP used the changed signal between the reference signal and the measured signal. In other words, the impulse response in Equation (3) was not directly predicted, but the sound field variation in the same reproduction signal was estimated by subtracting the reference signal from the measured microphone signal. We used a triangular moving average of 36 samples in the 48 kHz sampling rate, and the estimated distance was calculated as the product of time and sound speed. This averaging method empirically reduced the error variance of the estimated angle and distance in the proposed active localization. Figure 5 shows the block diagram used to implement the proposed method using the sound field variation and the SRP with a moving average. Figure 5a shows the steps to synchronize the measured signals. Figure 5b shows that the measured signals are stored as the reference signals if no event is detected, as depicted in Figure 5c, and Figure 5d indicates the proposed SRP to estimate the position of a silent person. Step for reference signals defined as measured signals if no event is detected; (c) Step for event detection; (d) Step for SRP using Equations (7)-(9).
In the signal synchronization step in Figure 5a, we set up the block diagram to minimize the time delay between the reference signal and event signal for each microphone. Thus, two steps were involved. The first was to reduce the quantization error by setting the clocks of the loudspeaker and microphone board identically in hardware. The second step, after measurement, was to verify and compensate for the time delay between ref t of Equation (7) and the peak of the generated signal based on correlation. The event detection in Figure 5c was used to determine intrusion by selecting the threshold of sound field variation in [17]. In this study, we focused on the analysis of the SRP results in Figure 5d. In other words, we aimed to analyze the relationship between the variables (reverberation time and early decay time) in the control space and the signal processing results. Step for reference signals defined as measured signals if no event is detected; (c) Step for event detection; (d) Step for SRP using Equations (7)-(9).
In the signal synchronization step in Figure 5a, we set up the block diagram to minimize the time delay between the reference signal and event signal for each microphone. Thus, two steps were involved. The first was to reduce the quantization error by setting the clocks of the loudspeaker and microphone board identically in hardware. The second step, after measurement, was to verify and compensate for the time delay between t ref of Equation (7) and the peak of the generated signal based on correlation. The event detection in Figure 5c was used to determine intrusion by selecting the threshold of sound field variation in [17]. In this study, we focused on the analysis of the SRP Appl. Sci. 2020, 10, 9090 8 of 26 results in Figure 5d. In other words, we aimed to analyze the relationship between the variables (reverberation time and early decay time) in the control space and the signal processing results.
The signal generated by the loudspeaker formed a sound field with a specific frequency band in a security area using the Gaussian-modulated sinusoidal pulse of Equation (10), and then the change to the sound field was measured using the microphone array.
where A is the magnitude of the signal, κ =5π 2 b 2 f 2 center /(q · ln (10)) is the envelope constant, b is the normalized bandwidth, q is the attenuation of the signal, f center is the center frequency, and d is the time delay.
In this study, the center frequency was fixed at 1 kHz, and the attenuation and normalized bandwidth of the sound source were set to 6 and 0.25, respectively.
The center frequency was 1 kHz because the directivity pattern of the loudspeaker used in the experiment was cardioid at 1 kHz.
When analyzing a short-period pure-tone signal as a frequency component, a discrete-time Fourier transform was used, and at least five periods were required to estimate the frequency components. Therefore, the attenuation and normalized bandwidth were selected to form five periods in the pulse sound ( Figure 6).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 26 The signal generated by the loudspeaker formed a sound field with a specific frequency band in a security area using the Gaussian-modulated sinusoidal pulse of Equation (10), and then the change to the sound field was measured using the microphone array.
where A is the magnitude of the signal, (10) is the envelope constant, b is the normalized bandwidth, q is the attenuation of the signal, fcenter is the center frequency, and d is the time delay.
In this study, the center frequency was fixed at 1 kHz, and the attenuation and normalized bandwidth of the sound source were set to 6 and 0.25, respectively.
The center frequency was 1 kHz because the directivity pattern of the loudspeaker used in the experiment was cardioid at 1 kHz.
When analyzing a short-period pure-tone signal as a frequency component, a discrete-time Fourier transform was used, and at least five periods were required to estimate the frequency components. Therefore, the attenuation and normalized bandwidth were selected to form five periods in the pulse sound ( Figure 6).

Configuration for the Simulations and Experiments
This section describes the configuration of the simulations and experiments. The configuration shown in Figure 7 was applied to the conceptual verification in an anechoic chamber described in Section 2.4, the analysis of operating conditions described in Section 3, and the experimental verification of the proposed method in a classroom described in Section 4. In Figure 7, A, B, C, and D denote the positions of a silent intruder. Two types of intruders were used in the experiments in an anechoic chamber and a classroom. The first was a PVC pipe 0.3 m in diameter. The second was a person.
The reasons for using two types of intruders were as follows. The PVC pipe was used to identify trends in the localization performance of the proposed active localization method. In other words, using the circular PVC pipe, the reflection sound was uniformly generated even when the sound source was incident at any angle. Therefore, the PVC pipe was used to minimize the change in the absorption ratio of the intruder. The analysis using a PVC pipe was compared with the experimental results of human intrusion and was the background used to simulate the person as a circular boundary.
Each superscript on the characters A, B, C, and D of the intruder shows the distance between the active localization system and the intruder position, and each subscript shows the counterclockwise angle between the microphone array and the intruder. The active localization system consisted of a loudspeaker, microphone array, and controller. The positions of the silent intruder were represented by the distance and angle, and the positions of the silent intruder were determined to be the event

Configuration for the Simulations and Experiments
This section describes the configuration of the simulations and experiments. The configuration shown in Figure 7 was applied to the conceptual verification in an anechoic chamber described in Section 2.4, the analysis of operating conditions described in Section 3, and the experimental verification of the proposed method in a classroom described in Section 4. In Figure 7, A, B, C, and D denote the positions of a silent intruder. Two types of intruders were used in the experiments in an anechoic chamber and a classroom. The first was a PVC pipe 0.3 m in diameter. The second was a person.
The reasons for using two types of intruders were as follows. The PVC pipe was used to identify trends in the localization performance of the proposed active localization method. In other words, using the circular PVC pipe, the reflection sound was uniformly generated even when the sound source was incident at any angle. Therefore, the PVC pipe was used to minimize the change in the absorption ratio of the intruder. The analysis using a PVC pipe was compared with the experimental results of human intrusion and was the background used to simulate the person as a circular boundary.  The size of the control area in the security zone was 2 m × 3 m. The microphones used in the experiment were seven-array microphones. The excitation signal in the simulations and experiments was a Gaussian-modulated sinusoidal pulse with a 1 kHz center frequency (Equation (10)) and the spacing between the microphones was configured to be the same as the Nyquist spacing ( λ 2 ), which corresponded to a center frequency of 1 kHz. This was because when designing the beamformer of the single frequency, the Nyquist spacing had the maximum array gain and directivity [37].

Preliminary Experiments in Ideal Conditions
This section presents the experimental results in an anechoic chamber. If the proposed method is directly applied in an actual space, exactly matching the analysis with the experimental results becomes difficult because of the various spatial effects ( effect m R ). Therefore, the experimental procedure was performed in an anechoic space to quantitatively verify the accuracy of the proposed approach. In other words, we excluded the environmental elements of the control space and confirmed that the proposed concept exhibited no problem under ideal conditions. Figure 8 depicts the proposed SRP results obtained from the experiment when a PVC pipe or a person is a silent intruder. Each image shows the intruder position using relative power values (dB).
In Case 1 (Figure 8a-d), when examining the position estimation of the intruder (i.e., a PVC pipe), although the angle had no error, the error of the distance was observed to reach up to 0.04 m (for position A).
In Case 2 (Figure 8e-h), the error for the angle was confirmed to reach 5° (for position C) and the error for the distance ranged up to 0.13 m (position D) if a person was in each intrusion position. According to these results, when reviewing the energy maps again in terms of the maximum error, Case 1 indicated that the intruder position was estimated with a relatively small error. This was because the PVC pipe had a specific boundary condition at a fixed location without moving. As a result, a consistent reflection wave was measured by the active localization system. However, Case 2 indicated that the reflected signals measured by the microphone array were not constant when a person was in the intruder position. The reason was that a slight movement occurred although the person remained in the same position. From this difference, the position estimations of the intruder in the two cases had different results in terms of the maximum error. Nonetheless, we confirmed the feasibility of position estimation through reflections. Each superscript on the characters A, B, C, and D of the intruder shows the distance between the active localization system and the intruder position, and each subscript shows the counterclockwise angle between the microphone array and the intruder. The active localization system consisted of a loudspeaker, microphone array, and controller. The positions of the silent intruder were represented by the distance and angle, and the positions of the silent intruder were determined to be the event scenarios close to the wall (positions A and D) or the center of the active localization system (positions B and C).
The size of the control area in the security zone was 2 m × 3 m. The microphones used in the experiment were seven-array microphones. The excitation signal in the simulations and experiments was a Gaussian-modulated sinusoidal pulse with a 1 kHz center frequency (Equation (10)) and the spacing between the microphones was configured to be the same as the Nyquist spacing (λ/2), which corresponded to a center frequency of 1 kHz. This was because when designing the beamformer of the single frequency, the Nyquist spacing had the maximum array gain and directivity [37].

Preliminary Experiments in Ideal Conditions
This section presents the experimental results in an anechoic chamber. If the proposed method is directly applied in an actual space, exactly matching the analysis with the experimental results becomes difficult because of the various spatial effects (R effect m ). Therefore, the experimental procedure was performed in an anechoic space to quantitatively verify the accuracy of the proposed approach. In other words, we excluded the environmental elements of the control space and confirmed that the proposed concept exhibited no problem under ideal conditions. Figure 8 depicts the proposed SRP results obtained from the experiment when a PVC pipe or a person is a silent intruder. Each image shows the intruder position using relative power values (dB).
In Case 1 (Figure 8a-d), when examining the position estimation of the intruder (i.e., a PVC pipe), although the angle had no error, the error of the distance was observed to reach up to 0.04 m (for position A).
In Case 2 (Figure 8e-h), the error for the angle was confirmed to reach 5 • (for position C) and the error for the distance ranged up to 0.13 m (position D) if a person was in each intrusion position. According to these results, when reviewing the energy maps again in terms of the maximum error, Case 1 indicated that the intruder position was estimated with a relatively small error. This was because the PVC pipe had a specific boundary condition at a fixed location without moving. As a result, a consistent reflection wave was measured by the active localization system. However, Case 2 indicated that the reflected signals measured by the microphone array were not constant when a person was in the intruder position. The reason was that a slight movement occurred although the person remained in the same position. From this difference, the position estimations of the intruder in the two cases had different results in terms of the maximum error. Nonetheless, we confirmed the feasibility of position estimation through reflections.
Two important conclusions can be drawn. Firstly, the position of a person can be detected using the proposed active localization. Secondly, the energy maps of a person are similar to those of a PVC pipe, which is a circular object. The result indicates that the active localization method can detect the position of an object or a person, and it was the basis for modeling a person as a circular object in the subsequent simulation.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 26 Two important conclusions can be drawn. Firstly, the position of a person can be detected using the proposed active localization. Secondly, the energy maps of a person are similar to those of a PVC pipe, which is a circular object. The result indicates that the active localization method can detect the position of an object or a person, and it was the basis for modeling a person as a circular object in the subsequent simulation.

Simulation Test for the Reverberant Environment
The active localization method uses reflected sounds; thus, the proposed method is affected by the boundary condition (the property of the wall surface) of the control space. Consequently, the error in Equation (3) increases as the reflection on the wall increases, and the detection performance may be degraded depending on the characteristics of the boundary.
We simulated the environmental operating conditions of the proposed method using the following steps. STEP 1: The error of localization performance was analyzed by changing the absorption coefficient at the boundary of the target control space (2 m × 3 m). STEP 2: To examine the correlation between the absorption coefficient of the boundary and the spatial effects, we analyzed the acoustic parameters of the reverberation time (RT20) and early decay time (EDT). STEP 3: The operating conditions of the active localization were presented using RT20 and EDT.
The experimental approach makes determining sufficient conditions for the proposed method difficult. The results of step 1 based on the finite-difference time-domain (FDTD) simulation are presented in Section 3.1.2, and the results of steps 2 and 3 are described in Section 3.2.

Simulation Setup
The FDTD method is the numerical solution of the differential equation of a wave. The FDTD method is commonly used for nonstaggered compact schemes expressing only pressure [38] and Yee's staggered schemes expressing particle velocity and pressure [39].
In this study, the simulation was modeled as Yee's scheme to use a circular rigid body [40] and a perfectly matched layer (PML) boundary [41]. The circular rigid body boundary was used to model the silent intruder because the characteristics of a person and a PVC pipe were observed to be similar. The PML condition was used to describe the anechoic environment.
The reverberation of the control space was controlled by adjusting the sound absorption coefficient at the boundary. Hence, the momentum equation with the impedance boundary condition was used, and it is expressed as follows: where p is the sound pressure; v x and v y are the particle velocities of the x and y axes, respectively; ρ 0 is the air density; c is the speed of sound; λ c is the courant number; ζ is the specific acoustic impedance; α is the absorption coefficient; n is the time index; and u and w are indices of the spatial point. In this study, this impedance boundary condition was derived by combining the asymmetric finite-difference approximation used in [39] and the locally reacting boundary used in a room simulation in [38]. The derivation is described in Appendix A. Therefore, we enabled the simulation of the reverberation environment in the Yee scheme using the change in α.
The FDTD simulation utilized a 2 m × 3 m control space ( Figure 7) and a spatial resolution of 0.01 m. The sampling frequency (f s,FDTD ) was 49 kHz. As the selection criteria of the parameters, a sampling rate that satisfied the courant condition was selected while the spatial resolution was fixed. The position of the silent intruder was set at representative positions (A, B, C, and D) as mentioned in Section 2.3.
The source model in the FDTD simulation is a physically constrained source (PCS) [42], and the formula is as follows: H m e jωn = b 0 + b 2 e −j2ωn 1+a 1 e −jωn + a 2 e −j2ωn (19) where p [n] (u, w) is the pressure node of the source, δs is the spatial resolution, A s = 4πa 2 0 is the surface area of the sphere in volume velocity, q [n] (u, w) is the velocity source, s 0 , and Q are the mass, damping, elasticity, and quality factor constants characterizing the mechanical system of the source, respectively. ω 0 is the normalized low resonance frequency of the mechanical system, M p = 4N p −1 is the FIR filter order, and ω c is the normalized cutoff frequency of the FIR filter.
In this study, M p was 16 samples, the normalized cutoff frequency was 0.05, the low resonance frequency was 300 Hz, M m was 0.025 Kg, and Q was set to 0.6. Figure 9 shows the result images of the active localization method by changing the absorption coefficient of the boundary at position B. The images on the left in Figure 9 show the captured images in the FDTD simulation obtained by reproducing the PCS model. The images on the right indicate the energy maps expressed by the convolution signal of Equation (10) and the impulse response obtained by the FDTD simulation, respectively.

Simulation Results and Analysis
In Figure 9, the reflections propagating from the intruder to the microphone array according to each alpha are similar. The image results show that the magnitude of the wavefront formed by the edge boundary increases as the absorption coefficient of the edge boundary decreases. As a result, the overlap of the reflection formed behind the intruder also increases. In other words, as the reflected sound formed at the boundary becomes significantly louder than the reflected sound produced by the intruder, the spatial effect increases such that the overlapped signal is larger than the intruder's signal. Therefore, the simulation indicated that the error of position estimation increases with the boundary characteristics of the control space. The simulation results are summarized in Table 1, in which the errors in parentheses represent the angular and distance errors. In Figure 9, the reflections propagating from the intruder to the microphone array according to each alpha are similar. The image results show that the magnitude of the wavefront formed by the edge boundary increases as the absorption coefficient of the edge boundary decreases. As a result, the overlap of the reflection formed behind the intruder also increases. In other words, as the reflected sound formed at the boundary becomes significantly louder than the reflected sound produced by the intruder, the spatial effect increases such that the overlapped signal is larger than the intruder's signal. Therefore, the simulation indicated that the error of position estimation increases with the As Table 1 shows, the distance error was affected more by the reflectance of the boundary than by the angular error. There was a 5 • error only at the angle at which the sound absorption was below 40% (α × 100%) at the D position. From a distance error point of view, some scenarios failed to detect an intruder. In other words, when the diameter (0.3 m) of the circle considered as the intruder and the predicted distance were combined, the estimated distance exceeded the control space of 2 m × 3 m. The results of α being less than 0.6 at position A and less than 0.5 at position D were the result of detection failure. In addition, when the distance error was viewed in terms of error magnitude, a large error of 0.5 m or more, at α < 0.5 at position B was observed.
Therefore, we confirmed through the simulation that the approach proposed in this paper operates at α ≥ 0.7, for which no angular error exists and the distance error is less than 19%.
In the next section, we describe the relational equation that predicts the environment in which the active localization method operates through the RT20 and EDT of the acoustic parameters. This is because verifying the operation of the proposed method based on the boundary reflectance in a general reverberant environment is very difficult.

Relationship Analysis of Acoustic Parameters and Absorption Coefficients to Propose Operating Conditions
In this section, the conditions under which the active localization method operates in a reverberant space are explained using the relationship between the acoustic parameters and the absorption coefficient discussed in the previous section.
The proposed approach predicts the position of a silent intruder based on the sound reflected from the intruder, and this phenomenon occurs within a short time; therefore, the pattern of early reflection is very important. If the maximum distance of the active localization system is estimated to be 3 m, the sound source generated by a loudspeaker moves for approximately 17.54 ms when the round-trip distance of the sound source is 6 m and the speed of sound is 342 m/s. In other words, the phenomenon occurring within 18 ms should be analyzed. Therefore, the EDT and RT20 of the acoustic parameters were used to analyze the control space. EDT includes the direct sound and early reflections, and RT20 has the smallest energy decay time considered for the reverberation time indices. EDT and RT20 are expressed in the same equation as follows [43]: (20) Equation (20) normalizes the signal power, and we can calculate the time when power decreases from 0 to −10 dB and from −5 to −25 dB through the time variable in the denominator. The time difference of the former is defined as EDT, and the latter is defined as RT20.
When considering the two indices as the early reflection perspective in the RIR, the EDT can physically determine if a large amount of early reflection occurs at the measured location after the direct sound is played. This is because the EDT is the time from the measurement of the direct sound until the signal with an early reflection decrease of −10 dB. RT20 strictly refers to the time when the reverberation energy decreases gradually except for the direct sound and strong early reflection.
To analyze the relationship between the absorption coefficient and EDT/RT20, the microphone was placed at the representative intrusion position shown in Figure 7, and the microphone array signals were compared with the signals of the microphones distributed in the space. Figure 10a shows the arrangement of the microphones to confirm the operation of the active localization method proposed here. The first to seventh microphones were the array of microphones used in the proposed system, and the eighth to eleventh microphones were placed in the representative positions A, B, C, and D, respectively.  (20) Equation (20) normalizes the signal power, and we can calculate the time when power decreases from 0 to −10 dB and from −5 to −25 dB through the time variable in the denominator. The time difference of the former is defined as EDT, and the latter is defined as RT20.
When considering the two indices as the early reflection perspective in the RIR, the EDT can physically determine if a large amount of early reflection occurs at the measured location after the direct sound is played. This is because the EDT is the time from the measurement of the direct sound until the signal with an early reflection decrease of −10 dB. RT20 strictly refers to the time when the reverberation energy decreases gradually except for the direct sound and strong early reflection.
To analyze the relationship between the absorption coefficient and EDT/RT20, the microphone was placed at the representative intrusion position shown in Figure 7, and the microphone array signals were compared with the signals of the microphones distributed in the space. Figure 10a shows the arrangement of the microphones to confirm the operation of the active localization method proposed here. The first to seventh microphones were the array of microphones used in the proposed system, and the eighth to eleventh microphones were placed in the representative positions A, B, C, and D, respectively.  Figure 10b shows the energy decay curve for the impulse response of the ninth microphone. The energy decay curve of the ninth microphone did not decrease linearly but in a staircase form. This was because the space represented in this simulation was not diffuse. In other words, no diffuse-field reverberation occurred owing to the small space of the simulation and the proximity of the loudspeaker and microphone. As a result, the energy decay curve had an approximate exponential shape of a decay curve, but not the diffuse decay curve (Figure 10b). However, the equation was considered to be suitable for analyzing the space from a physical perspective to confirm the operating conditions of the proposed method. This was because the proposed method was analyzed based on a short time, and the changes in the early reflections were presented by the variation of the EDT and RT20 parameters. Figure 11 shows the results of EDT and RT20 for each microphone as the sound absorption coefficient decreased. The main point is whether the numerical values measured at the boundaries of the control spaces from microphones 1 to 7 and those measured in the control space from microphones 8 to 11 exhibited a specific trend.  Figure 10b shows the energy decay curve for the impulse response of the ninth microphone. The energy decay curve of the ninth microphone did not decrease linearly but in a staircase form. This was because the space represented in this simulation was not diffuse. In other words, no diffuse-field reverberation occurred owing to the small space of the simulation and the proximity of the loudspeaker and microphone. As a result, the energy decay curve had an approximate exponential shape of a decay curve, but not the diffuse decay curve (Figure 10b). However, the equation was considered to be suitable for analyzing the space from a physical perspective to confirm the operating conditions of the proposed method. This was because the proposed method was analyzed based on a short time, and the changes in the early reflections were presented by the variation of the EDT and RT20 parameters. Figure 11 shows the results of EDT and RT20 for each microphone as the sound absorption coefficient decreased. The main point is whether the numerical values measured at the boundaries of the control spaces from microphones 1 to 7 and those measured in the control space from microphones 8 to 11 exhibited a specific trend.  Figure 11b indicates that the result of the fourth microphone, which was located at the same position as the loudspeaker, was very small compared with the results of other microphones. This was because the loudspeaker and microphone arrangements were very similar such that the characteristics of the room were not sufficiently reflected. Therefore, when analyzing the results of RT20, a criterion for the minimum value to be used for the analysis was necessary.
This criterion was selected as the maximum time for the sound from the loudspeaker to reach the person and back to the microphone again. This is because we can determine that the direct sound and strong early reflection are dominant in a microphone signal if the measured time of RT20 is shorter than the propagation time of the sound source generated by the loudspeaker.
The farthest distance in the configuration of this study was 2.62 m, which was the distance from microphone 4 to the upper corner (2.
where c t is the criterion time, max d is the maximum distance of a sound source in the control domain, and c is the speed of sound.
Therefore, when analyzing RT20, values less than c t were excluded from the analysis. Figure 12 is a graph showing the minimum, maximum, and median values of EDT and RT20 in the microphone array and control space according to α. In this scenario, the microphone signal that did not satisfy c t was excluded from the RT20 analysis. The results of the microphones in the array and control space are represented by the red dashed and blue solid lines, respectively. The marker on each graph is the median value, the top line of the deviation is the maximum value, and the bottom line is the minimum value.  Figure 11b indicates that the result of the fourth microphone, which was located at the same position as the loudspeaker, was very small compared with the results of other microphones. This was because the loudspeaker and microphone arrangements were very similar such that the characteristics of the room were not sufficiently reflected. Therefore, when analyzing the results of RT20, a criterion for the minimum value to be used for the analysis was necessary.
This criterion was selected as the maximum time for the sound from the loudspeaker to reach the person and back to the microphone again. This is because we can determine that the direct sound and strong early reflection are dominant in a microphone signal if the measured time of RT20 is shorter than the propagation time of the sound source generated by the loudspeaker.
The farthest distance in the configuration of this study was 2.62 m, which was the distance from microphone 4 to the upper corner (2.92 m) minus the distance of 0.3 m at which a person can stand. The criterion time can be selected as follows: where t c is the criterion time, d max is the maximum distance of a sound source in the control domain, and c is the speed of sound. Therefore, when analyzing RT20, values less than t c were excluded from the analysis. Figure 12 is a graph showing the minimum, maximum, and median values of EDT and RT20 in the microphone array and control space according to α. In this scenario, the microphone signal that did not satisfy t c was excluded from the RT20 analysis. The results of the microphones in the array and control space are represented by the red dashed and blue solid lines, respectively. The marker on each graph is the median value, the top line of the deviation is the maximum value, and the bottom line is the minimum value.
The EDT results shown in Figure 12a indicate that the median value of the microphones in the control space was higher than that in the array. However, the deviation confirmed that the EDT results in the array were large depending on the absorption coefficient.
The RT20 results depicted in Figure 12b indicate that until α = 0.7, the median value of the control space was larger than that of the array, but from 0.6, the opposite result was observed. The deviation tended to increase and decrease as α decreased. Figure 12 is a graph showing the minimum, maximum, and median values of EDT and RT20 in the microphone array and control space according to α. In this scenario, the microphone signal that did not satisfy c t was excluded from the RT20 analysis. The results of the microphones in the array and control space are represented by the red dashed and blue solid lines, respectively. The marker on each graph is the median value, the top line of the deviation is the maximum value, and the bottom line is the minimum value. Analyzing the values in Figure 12 according to the conclusion in Section 3.1.2 that the proposed approach operated in an environment with α > 0.7, the following features were obtained. From the EDT results in Figure 12a, we observed that the maximum values of the array became smaller than the maximum values of the control space when α was greater than 0.7. When the results of RT20 in Figure 12b were analyzed as a median value, when alpha is greater than 0.7, the median values of the array were smaller than those of the control space. The results are summarized in Table 2. As the results in Tables 1 and 2 show, the active localization method proposed in this paper can detect the position of a person and an object under the following conditions:  (24) where m is the microphone index and t c is the criterion time. Equation (22) indicates a condition in which the maximum EDT value of the array is smaller than that of the control space. Equation (23) indicates that the median value of RT20 in the array is less than its median value in the control space, where RT20 values above t c are used.
Therefore, we observed that if the microphones are installed in the array and control space, the acoustic parameters of EDT and RT20 satisfy the conditions of Equations (22) and (23), and the active localization method can be implemented.

Experimental Results of Active Localization in a Reverberant Environment
This section presents the experimental results to verify the proposed method. In Section 2.4, we confirmed the feasibility of the proposed approach in an anechoic chamber, that is, the concept of detecting the position of a person or an object through a reflected sound. The results of an anechoic chamber indicated that there was no error in Equation (3). However, in an actual space in which reverberation exists, an error occurs in Equation (3). Therefore, the conditions under which the active localization method can operate in the reverberation space are identified in Section 3.
We used Equations (22) and (23) to predict whether the active localization method would function in a classroom, and we describe the experimental results using the proposed method to estimate the position of a PVC pipe and a person. Figure 13 shows the experimental environment of an empty classroom. The experiments were performed at the same position as the silent intruder ( Figure 7). The room acoustic parameters were measured using the configuration shown in Figure 10a, and the results are presented in Table 3.

Experimental Configuration and Operating Conditions Test
Appl. Sci. 2020, 10, x FOR PEER REVIEW 18 of 26

Experimental Results of Active Localization in a Reverberant Environment
This section presents the experimental results to verify the proposed method. In Section 2.4, we confirmed the feasibility of the proposed approach in an anechoic chamber, that is, the concept of detecting the position of a person or an object through a reflected sound. The results of an anechoic chamber indicated that there was no error in Equation (3). However, in an actual space in which reverberation exists, an error occurs in Equation (3). Therefore, the conditions under which the active localization method can operate in the reverberation space are identified in Section 3.
We used Equations (22) and (23) to predict whether the active localization method would function in a classroom, and we describe the experimental results using the proposed method to estimate the position of a PVC pipe and a person. Figure 13 shows the experimental environment of an empty classroom. The experiments were performed at the same position as the silent intruder ( Figure 7). The room acoustic parameters were measured using the configuration shown in Figure 10a, and the results are presented in Table 3. Figure 13. Experimental configuration to estimate the position of a PVC pipe using an active localization system in a classroom. This experiment was performed in an empty classroom to minimize the influence of the presence of furniture or other interior materials in the room. Table 3. Results of room acoustic parameters measured in the control space as in Figure 13 at positions shown in Figure 10a.

Experimental Configuration and Operating Conditions Test
Position EDT (ms) RT20 (ms) 1   . Experimental configuration to estimate the position of a PVC pipe using an active localization system in a classroom. This experiment was performed in an empty classroom to minimize the influence of the presence of furniture or other interior materials in the room. Table 3 shows the EDT and RT20 measured at seven microphones in an array, and the EDT and RT20 measured at the eighth to eleventh microphones as the representative intrusion shown in Figures 7 and 10a. Table 3. Results of room acoustic parameters measured in the control space as in Figure 13 at positions shown in Figure 10a.

Position
EDT (ms) RT20 (ms) Firstly, when ascertaining the operating conditions using the EDT of Equation (22), the maximum value measured in the microphone array was 9.0 ms and the maximum value measured in the control space was 22.8 ms. Therefore, we confirmed that Equation (22) was satisfied.
Secondly, when the operating condition using the median value of RT20 in Equation (23) was applied to the data in Table 3, the median value of the array was 20.1 ms. By excluding the RT20 that did not satisfy Equation (24), the median value of the distributed microphones in the control space was 23.5 ms. Therefore, we confirmed that Equation (23) was also satisfied.
The results indicate that the proposed active localization method operates even if reverberation exists in the control space set as the security space. The localization results using SRP energy maps are discussed in the following section. Figure 14 depicts the energy maps obtained from the experimental results. Case 1 shows the test results when the PVC pipe was considered as a silent intruder, and Case 2 shows the results when a person is the silent intruder. Each image shows the intruder position using relative power values (dB). The square marker is the actual position and the cross marker is the estimated position.

Localization Performance in a Reverberant Environment
To analyze the experimental results in Figure 14, we compared the estimated position results with those in Table 1, which lists the simulation results of the reverberation environment. In Table 1, when examining the results of α greater than 0.7, which is the range in which the active localization method operates, no error of angle was observed and the error of distance was up to 19% (distance error 0.38 m).
The experimental results of the reverberation environment in Figure 14 indicate that the angle had no error, and the error for distance was within 6.5% (distance error 0.13 m). Therefore, the proposed active localization method can be implemented if the operating conditions of Equations (22) and (23) are satisfied, as discussed in Section 3.2. However, the position detection results of Case 2 shown in Figure 14 indicate an increased error compared with the results of the PVC pipe. To analyze this, the results in both an anechoic chamber and a classroom are summarized quantitatively in Table 4 as the error between the actual and estimated values of each experimental configuration. These position errors represent angle and distance errors. The results of the localization performances are compared in terms of the type of silent intruder (a PVC pipe or a person). Figure 14 depicts the energy maps obtained from the experimental results. Case 1 shows the test results when the PVC pipe was considered as a silent intruder, and Case 2 shows the results when a person is the silent intruder. Each image shows the intruder position using relative power values (dB). The square marker is the actual position and the cross marker is the estimated position.   The data of the anechoic chamber indicated the initial error of the proposed method under the condition that no effect of reflection and reverberation occurred in the control space, and the data of the classroom indicated the performance of the proposed method under conditions of reflection and reverberation. In an anechoic environment that represented the initial error, the position error increased in the scenario of a person compared with that of a PVC pipe. This was caused by the slight movement of the person, and the data results in Table 4 indicate that the position error can be further increased when this movement is combined with a reverberation environment.

Localization Performance in a Reverberant Environment
From the experimental results of Case 2 in a classroom, we confirmed that the results of the PVC pipe had a small error, within 6.5%, owing to the nonmovement of a pipe, whereas the cases of a human intruder indicated a relatively large error, within 5 • of the estimated angle and 34% of the estimated distance. Therefore, the above results of localization performance improve the limitations of the existing acoustic-based security system, for which an intruder must generate sound. Moreover, the proposed method estimates the x and y positions using a linear microphone array in a two-dimensional security space.

Conclusions and Discussion
In this paper, a new active localization method is proposed to estimate the position of a silent intruder.
For feasibility testing and analysis of the proposed method, we performed the following four steps. Firstly, feasibility tests were performed in an anechoic chamber. Secondly, an FDTD simulation was conducted to verify that the proposed method operates according to the reflection in the boundary of the control space. Thirdly, EDT and RT20 were used to represent the conditions under which active localization can operate in a reverberant environment through FDTD simulation data. Finally, the operation of the active localization method in a classroom was confirmed under conditions based on the EDT and RT20, and then we analyzed the localization results of a PVC pipe and a person through energy maps. Therefore, the proposed method was verified for the position estimation of a silent intruder. The active localization method is expected to be applied in home security systems in conjunction with conventional security sensors to improve the capability of intrusion detection because the proposed system can estimate the position of a silent intruder and can be implemented using loudspeakers and microphones built-in in home appliances.
In a further study, we intend to expand the frequency band to conduct more precise analyses of the security space, represent the SRP energy maps using wideband data, and design digital filters to determine the robustness of the proposed method. where p is the sound pressure, v x and v y are the particle velocities of the x and y axes, respectively, δs is the spatial discretization step, δt is the time discretization step, u and w are indices of the spatial point, ρ 0 is the air density, and c is the speed of sound.
the security space, represent the SRP energy maps using wideband data, and design digital filters to determine the robustness of the proposed method.
Funding: This research was funded by the "GIST Research Institute (GRI)" grant funded by the GIST in 2020.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The simulation was modeled as the Yee scheme of the FDTD method ( Figure A1). The wave equation is expressed as a two-dimensional linear acoustic domain [39]. The equation of the boundary condition causing reflection is expressed using the asymmetric finite-difference approximation expressed in [39]. In Equation (A5), p [n] (u + 0.5 , w) represents the velocity point in the x direction as a spatial point that does not exist in Figure A1. The point (u + 0.5, w) is an impedance boundary of the FDTD domain, which is suitable for expressing the locally reacting boundary affected only by a normal velocity because of the lattice structure of the Yee scheme. Equation (A6) represents the acoustic impedance with the locally reacting boundary.
where P is the acoustic pressure, and v n is the normal velocity. p [n] (u + 0.5 , w) in (29) is replaced with Zv [n] (u + 0.5 , w) based on Equation (A6). The linear interpolation to represent v .
The following concept is introduced to assign the frequency-independent absorption coefficient to Equation (A7).
In [44], the wall impedance is frequently divided by the characteristic impedance of air. The resulting quantity, expressed as Equation (A8), is called the specific acoustic impedance.
where Z is the acoustic impedance, ρ 0 is the density of air, and c is the speed of sound. The specific acoustic impedance is also represented by the reflection coefficient R.
The intensity of a plane wave is proportional to the square of the pressure amplitude. Therefore, the intensity of the reflected wave is smaller by a factor |R| 2 than that of the incident wave. This quantity is called the "absorption coefficient" of the wall. where λ c is the Courant number. ζ is the specific acoustic impedance and α is the function of the absorption coefficients of Equations (A9) and (A10). Figure A2 shows feasibility experiments in which the hidden intruder can be determined with the sound field variation proposed in this paper.

Appendix B
Appl. Sci. 2020, 10, x FOR PEER REVIEW 23 of 26 The intensity of a plane wave is proportional to the square of the pressure amplitude. Therefore, the intensity of the reflected wave is smaller by a factor 2 R than that of the incident wave. This quantity is called the "absorption coefficient" of the wall.  Figure A2 shows feasibility experiments in which the hidden intruder can be determined with the sound field variation proposed in this paper.

Appendix B
From the perspective of position estimation, a large position error of 0.5 m or more is generated for the distance, but from the perspective of detection, it exhibits a meaningful result that a person or object hidden behind an obstacle can be detected.
The echolocation method using the audible frequency proposed in this paper has a functional advantage that even a hidden person can be detected through the amount of scattering variation.