A Case Study of Whistle Detection and Localization for Humpback Dolphins in Taiwan

: In recent years, Taiwan’s government has focused on policies regarding offshore wind farming near the Indo-Paciﬁc humpback dolphin habitat, where marine mammal observation is a critical consideration. The present research developed an algorithm called National Taiwan University Passive Acoustic Monitoring (NTU_PAM) to assist marine mammal observers (MMOs). The algo-rithm performs whistle detection processing and whistle localization. Whistle detection processing is based on image processing and whistle feature extraction; whistle localization is based on the time difference of arrival (TDOA) method. To test the whistle detection performance, we used the same data to compare NTU_PAM and the widely used software PAMGuard. To test whistle localization, we designed a real ﬁeld experiment where a sound source projected simulated whistles, which were then recorded by several hydrophone stations. The data were analyzed to locate the moving path of the source. The results show that localization accuracy was higher when the sound source position was in the detection region composed of hydrophone stations. This paper provides a method for MMOs to conveniently observe the migration path and population dynamics of cetaceans without ecological disturbance.


Introduction
Currently, most of Taiwan's raw materials for energy production, including coking coal, fuel coal, crude, and liquefied natural gas [1], are imported and have a large and immediate impact on the environment.Therefore, the government has actively developed green energy, including offshore wind farms [2], but most sites overlap with Indo-Pacific humpback dolphin reservation zones.The noise from pile driving during construction may impact marine mammals and cause auditory injury, ranging from temporary threshold shift (TTS) to permanent threshold shift (PTS) in hearing [3].To minimize the noiseinduced impact on cetaceans caused by construction and the operation of wind turbines, establishing a marine mammal detection mechanism is a priority.The traditional method to detect cetaceans is visual, whereby marine mammal Observers (MMOs) work from vehicles, using the naked eye to search for cetaceans, an operation that is expensive and offers only a low probability of success; moreover, it is limited to daylight hours.Underwater acoustics provide an alternative technique to detect marine mammals, and the cetacean call can be used as a specific characteristic of detection.We used passive acoustic monitoring (PAM) to develop an algorithm and NTU_PAM to monitor cetacean calls followed by motion tracking.In addition to overcoming the weaknesses of the visual method, NTU_PAM can show the correlation between the results of the visual method and PAM.
the dolphin group at a close distance to receive the dolphin call, and they used the difference in arrival time of a sound at each hydrophone pair to localize the targets.Wiggins et al. [22] deployed a tracking high-frequency acoustic recording package (HARP) [23] consisting of four hydrophones at 3 m above the seafloor offshore of Southern California to track beached whales and dolphins.Wiggins et al. [24] also deployed four HARPs offshore of Southern California to track whistling dolphins.Both of Wiggins's methods used the TDOA method.Building on the demonstrated effectiveness of TDOA for tracking and localization, we utilized four hydrophone stations to form a kilometer-scale array for tracking the source based on TDOA.
We designed an experiment that simulated different whistle types in the real field and developed four PAM stations to track the artificial source.Four stations were deployed near Taichung Harbor to record the simulated calls.After processing the detected algorithm, finding the whistle time, and tracking the source, we compared the results from the algorithm and the moving path of the boat carrying the sound source.In this study, we developed an algorithm that does not require a trained model for the automatic detection of the whistle.The algorithm is based on the time length and frequency band of the whistle feature.Furthermore, the automatic detection algorithm and localization method were combined as NTU_PAM.NTU_PAM can work as an auxiliary tool for MMO during the daytime, and it can function as the main monitoring tool at night.

Whistle Detector Algorithm
Passive acoustic monitoring has been used widely in marine monitoring to amass longitudinal data and requires high-efficiency algorithms to assist researchers in finding the required file segments.We developed a whistle detector algorithm, which was then improved according to Li's prototype algorithm [25].The algorithm can detect any creature producing a whistle and the whistle's detected frequency range, depending on the species.There are six main processes in the algorithm: 1.
Transfer time-series data to a spectrogram by short-time Fourier transform (STFT); 2.
Remove the noise on the time axis of the spectrogram; 3.
Remove the salt and pepper noise in the spectrogram; 4.
Find the data point that satisfies the condition of the power spectral density (PSD) and signal-to-noise ratio (SNR); 5.
Extract data points using the features of whistles; 6.
Cluster data points into different whistles.
A flow chart of the algorithm is shown in Figure 1.In order to present whistles clearly on the spectrogram, some processes are based on image processing.Each process will be described in detail.Figure 2 shows each step of the results.
(HARP) [23] consisting of four hydrophones at 3 m above the seafloor offshore of Southern California to track beached whales and dolphins.Wiggins et al. [24] also deployed four HARPs offshore of Southern California to track whistling dolphins.Both of Wiggins's methods used the TDOA method.Building on the demonstrated effectiveness of TDOA for tracking and localization, we utilized four hydrophone stations to form a kilometerscale array for tracking the source based on TDOA.
We designed an experiment that simulated different whistle types in the real field and developed four PAM stations to track the artificial source.Four stations were deployed near Taichung Harbor to record the simulated calls.After processing the detected algorithm, finding the whistle time, and tracking the source, we compared the results from the algorithm and the moving path of the boat carrying the sound source.In this study, we developed an algorithm that does not require a trained model for the automatic detection of the whistle.The algorithm is based on the time length and frequency band of the whistle feature.Furthermore, the automatic detection algorithm and localization method were combined as NTU_PAM.NTU_PAM can work as an auxiliary tool for MMO during the daytime, and it can function as the main monitoring tool at night.

Whistle Detector Algorithm
Passive acoustic monitoring has been used widely in marine monitoring to amass longitudinal data and requires high-efficiency algorithms to assist researchers in finding the required file segments.We developed a whistle detector algorithm, which was then improved according to Li's prototype algorithm [25].The algorithm can detect any creature producing a whistle and the whistle's detected frequency range, depending on the species.There are six main processes in the algorithm: and signal-to-noise ratio (SNR); 5. Extract data points using the features of whistles; 6. Cluster data points into different whistles.
A flow chart of the algorithm is shown in Figure 1.In order to present whistles clearly on the spectrogram, some processes are based on image processing.Each process will be described in detail.Figure 2 shows each step of the results.

Spectrogram
We used the STFT [26], which adds a window function to obtain the frequency domain information changed by the time domain.This establishes a frame to slide on the time domain signal and extracts the signal in the frame, which convolves with the window function to perform the Fourier transform.This information is used to produce the spectrogram.The window function is the Hamming window [27], the frame length is 0.01 s, and the overlap is 90%.The STFT formula is shown in Equation (1), where ( ) is window function and ( ) is raw data.

Denoising on the Time Axis of the Spectrogram
Whistle length is long compared to impulse noise; therefore, we use the moving average method to remove impulse noise on the spectrogram.Every 20 points on the time axis of each single frequency band are averaged to build a new spectrogram; the formula is shown in Equation (2), where , is the original spectrogram and , is the new spectrogram after denoising.

Spectrogram
We used the STFT [26], which adds a window function to obtain the frequency domain information changed by the time domain.This establishes a frame to slide on the time domain signal and extracts the signal in the frame, which convolves with the window function to perform the Fourier transform.This information is used to produce the spectrogram.The window function is the Hamming window [27], the frame length is 0.01 s, and the overlap is 90%.The STFT formula is shown in Equation (1), where w(t) is window function and x(t) is raw data.

Denoising on the Time Axis of the Spectrogram
Whistle length is long compared to impulse noise; therefore, we use the moving average method to remove impulse noise on the spectrogram.Every 20 points on the time axis of each single frequency band are averaged to build a new spectrogram; the formula is shown in Equation ( 2), where S t, f is the original spectrogram and S t, f is the new spectrogram after denoising.

Removing Salt and Pepper Noise
A median filter, often used in image processing and a technique for nonlinear signal processing, was used to remove salt and pepper noise [28].The median of every 3-by-3 matrix on the spectrogram is calculated.The formula is shown in Equation ( 3), where S t, f is the spectrogram after the denoising and S t, f is the spectrogram after using the median filter.

Satisfying PSD and SNR Conditions
Since a whistle is a narrow frequency band signal, with the occurrence of a whistle, its PSD is much larger than that of the point whose frequency is very close to the whistle.The definition of SNR in this study is shown in Equation ( 4).If the PSD is larger than the PSD threshold and the SNR is larger than the SNR threshold simultaneously at a data point, the data point will be replaced by one.If this is not the case, the data point will be replaced by zero.The formula is shown in Equation ( 5).The new spectrogram B t, f is a binary image.The default value of the SNR threshold and the PSD threshold are 6 dB and 40 dB (re 1 µPa 2 /Hz), respectively.

Extracting the Whistle
As mentioned in Section 2.4, the whistle is a narrow frequency band and a continuous signal.In this method, the nearby data points whose value is one are connected and labeled as a segment.Next, two conditions are set: the frequency bandwidth threshold and the time length threshold.Lastly, the segments whose frequency bandwidth is smaller than the frequency bandwidth threshold and whose time length is longer than the time length threshold are retained.The binary image B t, f will be refreshed as a new image B t, f .The default values of frequency bandwidth threshold and time length threshold are 300 Hz and 0.06 seconds, respectively.

Clustering
The k-means method [29] is used to cluster the data points in B t, f .According to the difference of frequency and time, some of the whistle segments from Section 2.5 and above are merged.If the time interval of two segments is smaller than 0.3 seconds and the difference of frequency between two segments is smaller than 1 kHz simultaneously, two segments will be considered as one whistle segment.After merging, the k (number of clusters) is decided by the new number of segments.Each data point automatically combines into k whistles by calculating Euclidean distance of frequency and time index in B t, f .Each whistle's start time, end time, start frequency, and end frequency are recorded after k-means.

Localization Method
TDOA was used to track the whistle.We devised an experiment to track the moving path of the artificial source by a whistle detector algorithm and TDOA.

Time Difference of Arrival (TDOA)
TDOA is often used in signal source positioning [30].It only requires the received signal time and the speed that the signal travels.Once the signal is received at the two receiving stations, the difference in arrival time can be used to draw the hyperbola of possible location by the equation shown in Equations ( 6) and ( 7).If we have three receiving stations, least two hyperbolas are produced, as shown in Figure 3, and their intersection will be the signal source location.To realize this hypothesis, the receiving stations must be time-synchronized.
where t 1 , t 2 , and t 3 are the times when the same signal arrives at different hydrophones; (x, y) is the position of the unknown signal source; and c is the sound speed from the local sound speed profile.
receiving stations, the difference in arrival time can be used to draw the hyperbola of possible location by the equation shown in Equations ( 6) and ( 7).If we have three receiving stations, least two hyperbolas are produced, as shown in Figure 3, and their intersection will be the signal source location.To realize this hypothesis, the receiving stations must be time-synchronized.

Taichung Harbor TDOA Experimental Configuration
We deployed four hydrophone stations near Taichung Harbor, an area where Indo-Pacific humpback dolphins are extremely active [31,32].The locations of the hydrophones are shown in Figure 4, and the exact latitude and longitude are shown in Table 1.The Beaufort Sea state was below 3, and the ambient noise is illustrated in Figure 5 as a percentile level.The highest PSD was around 95 dB (re 1 μPa /Hz) from 60-70 Hz on L50, possibly produced by shipping noise, and the PSD from 3 kHz-10 kHz was around 65 dB (re 1 μPa /Hz).

Taichung Harbor TDOA Experimental Configuration
We deployed four hydrophone stations near Taichung Harbor, an area where Indo-Pacific humpback dolphins are extremely active [31,32].The locations of the hydrophones are shown in Figure 4, and the exact latitude and longitude are shown in Table 1.The Beaufort Sea state was below 3, and the ambient noise is illustrated in Figure 5 as a percentile level.The highest PSD was around 95 dB (re 1 µPa 2 /Hz) from 60-70 Hz on L50, possibly produced by shipping noise, and the PSD from 3 kHz-10 kHz was around 65 dB (re 1 µPa 2 /Hz).The SoundTrap ST500 hydrophone recorder was used at point J3, and three Wildlife Acoustics SM3M hydrophone recorders were used at points J1, J2, and J4.They were deployed using the bottom-mounted method with sampling frequency set to 96 kHz.To achieve time synchronization for all recorders, we produced an impulse signal as a benchmark for correcting the time before deploying.To simulate the whistle of an actual Indo-Pacific humpback dolphin, which features a frequency range of 3-9 kHz, three kinds of artificial sound signals were employed: (a) rising frequency (5-9 kHz), (b) U-type (9-5-9 kHz), and (c) decreasing frequency (9-5 kHz), with a time length of one second, as shown in Figure 6.The source level (SL) was 160 dB (re 1 μPa at 1 m).The underwater acoustic projector SQS-23 was placed at a water depth of 5 m (Figure 7), since Indo-Pacific humpback dolphins often stay about 5 m below sea level [33].Figure 8   The SoundTrap ST500 hydrophone recorder was used at point J3, and three Wildlife Acoustics SM3M hydrophone recorders were used at points J1, J2, and J4.They were deployed using the bottom-mounted method with sampling frequency set to 96 kHz.To achieve time synchronization for all recorders, we produced an impulse signal as a benchmark for correcting the time before deploying.To simulate the whistle of an actual Indo-Pacific humpback dolphin, which features a frequency range of 3-9 kHz, three kinds of artificial sound signals were employed: (a) rising frequency (5-9 kHz), (b) U-type (9-5-9 kHz), and (c) decreasing frequency (9-5 kHz), with a time length of one second, as shown in Figure 6.The source level (SL) was 160 dB (re 1 µ Pa at 1 m).The underwater acoustic projector SQS-23 was placed at a water depth of 5 m (Figure 7), since Indo-Pacific humpback dolphins often stay about 5 m below sea level [33].Figure 8 shows where the artificial sound signals were played, every 10 seconds for 10 minutes, in the 15 spots (T1-T15) outside Taichung Harbor.The SoundTrap ST500 hydrophone recorder was used at point J3, and three Wildlife Acoustics SM3M hydrophone recorders were used at points J1, J2, and J4.They were deployed using the bottom-mounted method with sampling frequency set to 96 kHz.To achieve time synchronization for all recorders, we produced an impulse signal as a benchmark for correcting the time before deploying.To simulate the whistle of an actual Indo-Pacific humpback dolphin, which features a frequency range of 3-9 kHz, three kinds of artificial sound signals were employed: (a) rising frequency (5-9 kHz), (b) U-type (9-5-9 kHz), and (c) decreasing frequency (9-5 kHz), with a time length of one second, as shown in Figure 6.The source level (SL) was 160 dB (re 1 μPa at 1 m).The underwater acoustic projector SQS-23 was placed at a water depth of 5 m (Figure 7), since Indo-Pacific humpback dolphins often stay about 5 m below sea level [33].

Experimental Data Analysis Method
In this experiment, the SNR of the received signal was larger than 10 dB, exceeding the NTU_PAM-recommended SNR threshold of 6 dB.The signals recorded by each of the hydrophones at the four stations when the source was at point T10 are shown in Figure 9.To find the artificial whistle within the sound file, NTU_PAM was used to extract information, namely the start and end times from the raw data of the four hydrophones.However, the extracted time information was not precise enough for TDOA.For increased accuracy, the raw data of the start and end times of the whistle were directly analyzed without being processed by the algorithm.The time of the J2 station was considered as the central time, and cross-correlation analysis with the full frequency band raw data of the central station and three other stations was performed to determine the time difference, as shown in Equations ( 8) and ( 9), where is J2 station's whistle raw data; is the three other stations' whistle raw data; is the result of cross-correlation; and is the time difference, which was used to obtain the location of the signal source by the TDOA method.

Experimental Data Analysis Method
In this experiment, the SNR of the received signal was larger than 10 dB, exceeding the NTU_PAM-recommended SNR threshold of 6 dB.The signals recorded by each of the hydrophones at the four stations when the source was at point T10 are shown in Figure 9.To find the artificial whistle within the sound file, NTU_PAM was used to extract information, namely the start and end times from the raw data of the four hydrophones.However, the extracted time information was not precise enough for TDOA.For increased accuracy, the raw data of the start and end times of the whistle were directly analyzed without being processed by the algorithm.The time of the J2 station was considered as the central time, and cross-correlation analysis with the full frequency band raw data of the central station and three other stations was performed to determine the time difference, as shown in Equations ( 8) and ( 9), where is J2 station's whistle raw data; is the three other stations' whistle raw data; is the result of cross-correlation; and is the time difference, which was used to obtain the location of the signal source by the TDOA method.

Experimental Data Analysis Method
In this experiment, the SNR of the received signal was larger than 10 dB, exceeding the NTU_PAM-recommended SNR threshold of 6 dB.The signals recorded by each of the hydrophones at the four stations when the source was at point T10 are shown in Figure 9.To find the artificial whistle within the sound file, NTU_PAM was used to extract information, namely the start and end times from the raw data of the four hydrophones.However, the extracted time information was not precise enough for TDOA.For increased accuracy, the raw data of the start and end times of the whistle were directly analyzed without being processed by the algorithm.The time of the J2 station was considered as the central time, and cross-correlation analysis with the full frequency band raw data of the central station and three other stations was performed to determine the time difference, as shown in Equations ( 8) and ( 9), where X 2 is J2 station's whistle raw data; X o is the three other stations' whistle raw data; R is the result of cross-correlation; and td is the time difference, which was used to obtain the location of the signal source by the TDOA method.

Comparison with PAMGuard
As mentioned, PAMGuard software is widely used in the field of marine mamma observation.In this research, the performance of NTU_PAM and the Whistle and Moa Detector module of PAMGuard were compared using the same hardware (an i9-9900 CPU from Intel Corporation with 64 GB of memory).The test audio is a two-minute sound file rich in whistles and with a sampling frequency of 96 kHz, recorded near the sea area o Yunlin, Taiwan [34].We manually confirmed that the file contained a total of 33 whistles When the PAMGuard Whistle and Moan Detector's parameters were set at a window length of 2048 data points (0.02 s) and 1024 data points (0.01 s), and when the overla ratios were 50% and 90%, the NTU_PAM's recommended window length was 0.01 s wit an overlap ratio of 90% and SNR set to 6 dB.As shown in Table 2, PAMGuard with set tings of window length at 1024 data points, 90% overlap ratio, and 6 dB SNR shows th closest result of the 47 detected whistles to the manually confirmed 33 whistles.A total o 30 whistles were detected by NTU_PAM.

Comparison with PAMGuard
As mentioned, PAMGuard software is widely used in the field of marine mammal observation.In this research, the performance of NTU_PAM and the Whistle and Moan Detector module of PAMGuard were compared using the same hardware (an i9-9900 CPU from Intel Corporation with 64 GB of memory).The test audio is a two-minute sound file, rich in whistles and with a sampling frequency of 96 kHz, recorded near the sea area of Yunlin, Taiwan [34].We manually confirmed that the file contained a total of 33 whistles.
When the PAMGuard Whistle and Moan Detector's parameters were set at a window length of 2048 data points (0.02 s) and 1024 data points (0.01 s), and when the overlap ratios were 50% and 90%, the NTU_PAM's recommended window length was 0.01 s with an overlap ratio of 90% and SNR set to 6 dB.As shown in Table 2, PAMGuard with settings of window length at 1024 data points, 90% overlap ratio, and 6 dB SNR shows the closest result of the 47 detected whistles to the manually confirmed 33 whistles.A total of 30 whistles were detected by NTU_PAM.

Experimental Results
At least three signal receiving stations were used to calculate TDOA.When the intersection of the hyperbolic curves is plural, the center point is taken as the final judgment location.To verify localization accuracy, GPS data from the experimental ship bearing the sound source were compared to results from TDOA.
In the series of graphs in Figure 10, the blue dot is the hydrophone station position (J1, J2, and J4), the red dot is the signal source position of the experimental ship's GPS record, and the yellow star is the TDOA positioning result.The results from the first experiment testing the rising frequency (5-9 kHz) signal are shown in Figure 10a.The positioning accuracy was higher when the sound source was nearer to the center positions J1 and J2 from the group of hydrophone stations.The nearest positioning points T4 to T11 showed an average positioning error of 24.7 m, and the overall positioning error was 143.5 m, which was affected by the lower accuracy of the outer point.

Experimental Results
At least three signal receiving stations were used to calculate TDOA.When the intersection of the hyperbolic curves is plural, the center point is taken as the final judgment location.To verify localization accuracy, GPS data from the experimental ship bearing the sound source were compared to results from TDOA.
In the series of graphs in Figure 10, the blue dot is the hydrophone station position (J1, J2, and J4), the red dot is the signal source position of the experimental ship's GPS record, and the yellow star is the TDOA positioning result.The results from the first experiment testing the rising frequency (5-9 kHz) signal are shown in Figure 10a.The positioning accuracy was higher when the sound source was nearer to the center positions J1 and J2 from the group of hydrophone stations.The nearest positioning points T4 to T11 showed an average positioning error of 24.7 m, and the overall positioning error was 143.5 m, which was affected by the lower accuracy of the outer point.
The second experiment was the decreasing frequency (9-5 kHz) signal, and its positioning trend was similar to the rising frequency signal (Figure 10b).It also showed higher positioning accuracy when the signal source was close to the J1 and J2 stations.The average positioning error of T4 to T11 was 44.8 m, larger than that of the rising frequency signal, and the overall positioning error was 145.9 m.Finally, the U-shaped (9-5-9 kHz) signal displayed a similar trend as the aforementioned signals (Figure 10c).The average positioning error of T4 to T11 was 39.6 m, but the overall positioning error was the smallest of the three signals at 116.1 m.

Discussion
In the comparison between PAMGuard and NTU_PAM, the results were close to the number of whistles that was manually confirmed and showed that both performed well on whistle detection.The reason for the different numbers detected may be that PAM-Guard is a real-time auxiliary tool mainly provided to visual method researchers for detecting the occurrence of a call; as such, it only needs a few window lengths of data to detect the whistle.As to the amount of audio data required, NTU_PAM needs one second or more of data to build a spectrogram and to initiate processing.However, PAMGuard may, at times, break one call into several calls, as shown in Figure 11.According to the The second experiment was the decreasing frequency (9-5 kHz) signal, and its positioning trend was similar to the rising frequency signal (Figure 10b).It also showed higher positioning accuracy when the signal source was close to the J1 and J2 stations.The average positioning error of T4 to T11 was 44.8 m, larger than that of the rising frequency signal, and the overall positioning error was 145.9 m.Finally, the U-shaped (9-5-9 kHz) signal displayed a similar trend as the aforementioned signals (Figure 10c).The average positioning error of T4 to T11 was 39.6 m, but the overall positioning error was the smallest of the three signals at 116.1 m.

Discussion
In the comparison between PAMGuard and NTU_PAM, the results were close to the number of whistles that was manually confirmed and showed that both performed well on whistle detection.The reason for the different numbers detected may be that PAMGuard is a real-time auxiliary tool mainly provided to visual method researchers for detecting the occurrence of a call; as such, it only needs a few window lengths of data to detect the whistle.As to the amount of audio data required, NTU_PAM needs one second or more of data to build a spectrogram and to initiate processing.However, PAMGuard may, at times, break one call into several calls, as shown in Figure 11.According to the results, NTU_PAM is suitable for to processing measurements captured over a longer duration, and it proves as robust as PAMGuard.
J. Mar.Sci.Eng.2021, 9, x FOR PEER REVIEW 11 of 14 results, NTU_PAM is suitable for to processing measurements captured over a longer duration, and it proves as robust as PAMGuard.
In the localization experiment, the TDOA method proved useful for localizing the whistle source.Figure 12 plots the errors of the three different types of signals at each spot and indicates that the error is small when the source is inside the region of the four hydrophone recorders (points T4-T11); when outside the region (points T1-T3 and T12-T15), location was only approximate (Figure 13).The results of this experiment indicate strengths in using the NTU_PAM for successful tracking of cetaceans.In the localization experiment, the TDOA method proved useful for localizing the whistle source.Figure 12 plots the errors of the three different types of signals at each spot and indicates that the error is small when the source is inside the region of the four hydrophone recorders (points T4-T11); when outside the region (points T1-T3 and T12-T15), location was only approximate (Figure 13).The results of this experiment indicate strengths in using the NTU_PAM for successful tracking of cetaceans.results, NTU_PAM is suitable for to processing measurements captured over a longer duration, and it proves as robust as PAMGuard.
In the localization experiment, the TDOA method proved useful for localizing the whistle source.Figure 12 plots the errors of the three different types of signals at each spot and indicates that the error is small when the source is inside the region of the four hydrophone recorders (points T4-T11); when outside the region (points T1-T3 and T12-T15), location was only approximate (Figure 13).The results of this experiment indicate strengths in using the NTU_PAM for successful tracking of cetaceans.

Conclusions
In this research, we devised and developed the NTU_PAM algorithm, which performs whistle detection and whistle localization based on the TDOA method.The results showed NTU_PAM is able to localize and track the whistle sound source with high accuracy.In the future, MMOs can monitor the moving path of marine mammals via the visual method combined with NTU_PAM, making it possible to monitor cetaceans without being limited by daylight hours.

Conclusions
In this research, we devised and developed the NTU_PAM algorithm, which performs whistle detection and whistle localization based on the TDOA method.The results showed NTU_PAM is able to localize and track the whistle sound source with high accuracy.In the future, MMOs can monitor the moving path of marine mammals via the visual method combined with NTU_PAM, making it possible to monitor cetaceans without being limited by daylight hours.

1 .
Transfer time-series data to a spectrogram by short-time Fourier transform (STFT); 2. Remove the noise on the time axis of the spectrogram; 3. Remove the salt and pepper noise in the spectrogram; 4. Find the data point that satisfies the condition of the power spectral density (PSD)

Figure 2 .
Figure 2.Each step of the results.

3 .
Removing Salt and Pepper Noise

Figure 2 .
Figure 2.Each step of the results.

Figure 5 .
Figure 5. Ambient noise percentile level: Ln is the noise level exceeding n% of the measurement time, i.e., L50 is the noise level exceeding 50% of the measurement time.

Figure 7 .
Figure 7. Schematic of the installation position of the projector.

Figure 8 .
Figure 8. Locations of the signal sources.

Figure 7 .
Figure 7. Schematic of the installation position of the projector.

Figure 7 .
Figure 7. Schematic of the installation position of the projector.

Figure 8 .
Figure 8. Locations of the signal sources.

Figure 8 .
Figure 8. Locations of the signal sources.

Figure 10 .
Figure 10.(a) Result of rising frequency signal; (b) result of decreasing frequency signal; (c) result of U-shaped signal.

Figure 10 .
Figure 10.(a) Result of rising frequency signal; (b) result of decreasing frequency signal; (c) result of U-shaped signal.

Figure 12 .
Figure 12.Distribution of localization errors.Figure 12. Distribution of localization errors.

Figure 12 .
Figure 12.Distribution of localization errors.Figure 12. Distribution of localization errors.

Table 1 .
Latitude and longitude of hydrophone stations.

Table 1 .
Latitude and longitude of hydrophone stations.
Figure5.Ambient noise percentile level: Ln is the noise level exceeding n% of the measurement time, i.e., L50 is the noise level exceeding 50% of the measurement time.

Table 1 .
Latitude and longitude of hydrophone stations.
Figure5.Ambient noise percentile level: Ln is the noise level exceeding n% of the measurement time, i.e., L50 is the noise level exceeding 50% of the measurement time.

Table 2 .
Comparison of results.

Table 2 .
Comparison of results.