1. Introduction
Currently, most of Taiwan’s raw materials for energy production, including coking coal, fuel coal, crude, and liquefied natural gas [1], are imported and have a large and immediate impact on the environment. Therefore, the government has actively developed green energy, including offshore wind farms [2], but most sites overlap with Indo-Pacific humpback dolphin reservation zones. The noise from pile driving during construction may impact marine mammals and cause auditory injury, ranging from temporary threshold shift (TTS) to permanent threshold shift (PTS) in hearing [3]. To minimize the noise-induced impact on cetaceans caused by construction and the operation of wind turbines, establishing a marine mammal detection mechanism is a priority. The traditional method to detect cetaceans is visual, whereby marine mammal Observers (MMOs) work from vehicles, using the naked eye to search for cetaceans, an operation that is expensive and offers only a low probability of success; moreover, it is limited to daylight hours. Underwater acoustics provide an alternative technique to detect marine mammals, and the cetacean call can be used as a specific characteristic of detection. We used passive acoustic monitoring (PAM) to develop an algorithm and NTU_PAM to monitor cetacean calls followed by motion tracking. In addition to overcoming the weaknesses of the visual method, NTU_PAM can show the correlation between the results of the visual method and PAM.
Cetaceans produce two major types of cetacean calls [4,5]: (1) the “whistle” is a continuous, narrow-band, and frequency-modulated signal that is thought to be a form of social communication; (2) the “click” is considered a bio-sonar and is a short, broadband, and directional impulse signal used to navigate, detect, and identify objects. In marine mammal research, PAM has proved a useful tool. For example, (1) Spaulding et al. [6] built a near-real-time buoy system to automatically detect North Atlantic right whale calls in Cape Cod Bay and near the Boston Harbor. When the buoy system detects a whale call, an alarm signal is transmitted and the call is recorded. (2) Linnenschmidt et al. [7] equipped an acoustic data logger on a porpoise to record clicks and determine the relationship among the click, movement, and diving behavior. (3) Akamatsu et al. [8] used an underwater pulse event recorder (A-TAG) to record clicks and analyze critical parameters such as interclick interval (ICI).
Previous studies of cetacean whistle detection have been vigorous. Gannier et al. [9] developed Seafox software to extract whistle characteristics (length, beginning frequency, ending frequency, maximum frequency, minimum frequency, etc.) on a time spectrogram and used a regression tree to classify five dolphin species. Lai [10] used the mel-frequency cepstral coefficient to simulate human auditory features, namely the critical band and auditory masking, and to extract the characteristics of the whistle. The whistle characteristics were then used in a support vector machine (SVM) to identify the cetacean species. Caldwell and Caldwell [11,12,13] hypothesized that signature whistle variations, which dolphins emit and which carry information, are required so distinctive whistles can be used to identify individual dolphins. Datta and Sturtivant [14] considered two whistle features on a spectrogram (overall contour shape and detailed contour structure difference) as parameters of the signature whistle and grouped whistles using the hidden Markov model (HMM) method. Bahoura and Simard [15] used an artificial neural network to classify blue whale calls. The above research is based on supervised machine learning methods requiring numerous sets of clean training data, manually labeling the calls, and building the model. These models are only suitable for specific or regional species.
To avoid the disadvantages of labeling, training, and specific targeting, Gillespie et al. [16] developed a whistle detector based on image processing on a spectrogram which is implemented as the Whistle and Moan Detector module in PAMGuard. PAMGuard software includes a user-friendly, human–machine interface and modules for data processing and marine mammal detection [17] and has been widely used for real-time marine mammal monitoring. Lin [18] devised a non-targeted algorithm on the MATLAB platform that helps users grasp the position of whistles across many audio files, making further processing convenient. Lin et al. [19] first denoised the spectrogram and then detected the whistle characteristics. Gillespie’s and Lin’s methods include four main steps: (1) spectrogram, (2) image processing, (3) whistle feature extraction, and (4) combination of the whistle data points. We applied the same pattern to develop the whistle detection algorithm. A similar concept is applied in steps 2–4, but the detailed methods are different. We also compared NTU_PAM and PAMGuard, which is regarded as a standard of whistle detection.
Tracking cetaceans is another recent primary research subject. Janik et al. [20] deployed three hydrophones to form a two-dimensional, triangular array in Beauly Firth, Northern Scotland, U.K. The interhydrophone distances were 208, 513, and 506 m. An artificial sound was then projected at a depth of 1 m. The time difference of signal arrival for each pair of hydrophones became markers to conduct localization of a sound source. Wang et al. [21] deployed a two-dimensional, cross-shaped array consisting of five hydrophones from the side of the boat at a depth of 1 m in Pearl River Estuary, China, and Beibu Gulf of Guangxi, China. The inter-hydrophone distances were 1.47, 1.54, 2.08, and 2.18 m. The boat followed the dolphin group at a close distance to receive the dolphin call, and they used the difference in arrival time of a sound at each hydrophone pair to localize the targets. Wiggins et al. [22] deployed a tracking high-frequency acoustic recording package (HARP) [23] consisting of four hydrophones at 3 m above the seafloor offshore of Southern California to track beached whales and dolphins. Wiggins et al. [24] also deployed four HARPs offshore of Southern California to track whistling dolphins. Both of Wiggins’s methods used the TDOA method. Building on the demonstrated effectiveness of TDOA for tracking and localization, we utilized four hydrophone stations to form a kilometer-scale array for tracking the source based on TDOA.
We designed an experiment that simulated different whistle types in the real field and developed four PAM stations to track the artificial source. Four stations were deployed near Taichung Harbor to record the simulated calls. After processing the detected algorithm, finding the whistle time, and tracking the source, we compared the results from the algorithm and the moving path of the boat carrying the sound source. In this study, we developed an algorithm that does not require a trained model for the automatic detection of the whistle. The algorithm is based on the time length and frequency band of the whistle feature. Furthermore, the automatic detection algorithm and localization method were combined as NTU_PAM. NTU_PAM can work as an auxiliary tool for MMO during the daytime, and it can function as the main monitoring tool at night.
2. Whistle Detector Algorithm
Passive acoustic monitoring has been used widely in marine monitoring to amass longitudinal data and requires high-efficiency algorithms to assist researchers in finding the required file segments. We developed a whistle detector algorithm, which was then improved according to Li’s prototype algorithm [25]. The algorithm can detect any creature producing a whistle and the whistle’s detected frequency range, depending on the species. There are six main processes in the algorithm:
Transfer time-series data to a spectrogram by short-time Fourier transform (STFT);
Remove the noise on the time axis of the spectrogram;
Remove the salt and pepper noise in the spectrogram;
Find the data point that satisfies the condition of the power spectral density (PSD) and signal-to-noise ratio (SNR);
Extract data points using the features of whistles;
Cluster data points into different whistles.
A flow chart of the algorithm is shown in Figure 1. In order to present whistles clearly on the spectrogram, some processes are based on image processing. Each process will be described in detail. Figure 2 shows each step of the results.
2.1. Spectrogram
We used the STFT [26], which adds a window function to obtain the frequency domain information changed by the time domain. This establishes a frame to slide on the time domain signal and extracts the signal in the frame, which convolves with the window function to perform the Fourier transform. This information is used to produce the spectrogram. The window function is the Hamming window [27], the frame length is 0.01 s, and the overlap is 90%. The STFT formula is shown in Equation (1), where is window function and is raw data.
(1)
2.2. Denoising on the Time Axis of the Spectrogram
Whistle length is long compared to impulse noise; therefore, we use the moving average method to remove impulse noise on the spectrogram. Every 20 points on the time axis of each single frequency band are averaged to build a new spectrogram; the formula is shown in Equation (2), where is the original spectrogram and is the new spectrogram after denoising.
(2)
2.3. Removing Salt and Pepper Noise
A median filter, often used in image processing and a technique for nonlinear signal processing, was used to remove salt and pepper noise [28]. The median of every 3-by-3 matrix on the spectrogram is calculated. The formula is shown in Equation (3), where is the spectrogram after the denoising and is the spectrogram after using the median filter.
(3)
2.4. Satisfying PSD and SNR Conditions
Since a whistle is a narrow frequency band signal, with the occurrence of a whistle, its PSD is much larger than that of the point whose frequency is very close to the whistle. The definition of SNR in this study is shown in Equation (4). If the PSD is larger than the PSD threshold and the SNR is larger than the SNR threshold simultaneously at a data point, the data point will be replaced by one. If this is not the case, the data point will be replaced by zero. The formula is shown in Equation (5). The new spectrogram is a binary image. The default value of the SNR threshold and the PSD threshold are 6 dB and 40 dB (re 1 ), respectively.
(4)
(5)
2.5. Extracting the Whistle
As mentioned in Section 2.4, the whistle is a narrow frequency band and a continuous signal. In this method, the nearby data points whose value is one are connected and labeled as a segment. Next, two conditions are set: the frequency bandwidth threshold and the time length threshold. Lastly, the segments whose frequency bandwidth is smaller than the frequency bandwidth threshold and whose time length is longer than the time length threshold are retained. The binary image will be refreshed as a new image . The default values of frequency bandwidth threshold and time length threshold are 300 Hz and 0.06 seconds, respectively.
2.6. Clustering
The k-means method [29] is used to cluster the data points in . According to the difference of frequency and time, some of the whistle segments from Section 2.5 and above are merged. If the time interval of two segments is smaller than 0.3 seconds and the difference of frequency between two segments is smaller than 1 kHz simultaneously, two segments will be considered as one whistle segment. After merging, the k (number of clusters) is decided by the new number of segments. Each data point automatically combines into k whistles by calculating Euclidean distance of frequency and time index in . Each whistle’s start time, end time, start frequency, and end frequency are recorded after k-means.
3. Localization Method
TDOA was used to track the whistle. We devised an experiment to track the moving path of the artificial source by a whistle detector algorithm and TDOA.
3.1. Time Difference of Arrival (TDOA)
TDOA is often used in signal source positioning [30]. It only requires the received signal time and the speed that the signal travels. Once the signal is received at the two receiving stations, the difference in arrival time can be used to draw the hyperbola of possible location by the equation shown in Equations (6) and (7). If we have three receiving stations, least two hyperbolas are produced, as shown in Figure 3, and their intersection will be the signal source location. To realize this hypothesis, the receiving stations must be time-synchronized.
(6)
(7)
where t1, t2, and t3 are the times when the same signal arrives at different hydrophones; (x, y) is the position of the unknown signal source; and c is the sound speed from the local sound speed profile.3.2. Taichung Harbor TDOA Experimental Configuration
We deployed four hydrophone stations near Taichung Harbor, an area where Indo-Pacific humpback dolphins are extremely active [31,32]. The locations of the hydrophones are shown in Figure 4, and the exact latitude and longitude are shown in Table 1. The Beaufort Sea state was below 3, and the ambient noise is illustrated in Figure 5 as a percentile level. The highest PSD was around 95 dB (re 1 ) from 60–70 Hz on L50, possibly produced by shipping noise, and the PSD from 3 kHz–10 kHz was around 65 dB (re 1 ).
The SoundTrap ST500 hydrophone recorder was used at point J3, and three Wildlife Acoustics SM3M hydrophone recorders were used at points J1, J2, and J4. They were deployed using the bottom-mounted method with sampling frequency set to 96 kHz. To achieve time synchronization for all recorders, we produced an impulse signal as a benchmark for correcting the time before deploying. To simulate the whistle of an actual Indo-Pacific humpback dolphin, which features a frequency range of 3–9 kHz, three kinds of artificial sound signals were employed: (a) rising frequency (5–9 kHz), (b) U-type (9–5–9 kHz), and (c) decreasing frequency (9–5 kHz), with a time length of one second, as shown in Figure 6. The source level (SL) was 160 dB (re 1 Pa at 1 m). The underwater acoustic projector SQS-23 was placed at a water depth of 5 m (Figure 7), since Indo-Pacific humpback dolphins often stay about 5 m below sea level [33]. Figure 8 shows where the artificial sound signals were played, every 10 seconds for 10 minutes, in the 15 spots (T1–T15) outside Taichung Harbor.
3.3. Experimental Data Analysis Method
In this experiment, the SNR of the received signal was larger than 10 dB, exceeding the NTU_PAM-recommended SNR threshold of 6 dB. The signals recorded by each of the hydrophones at the four stations when the source was at point T10 are shown in Figure 9. To find the artificial whistle within the sound file, NTU_PAM was used to extract information, namely the start and end times from the raw data of the four hydrophones. However, the extracted time information was not precise enough for TDOA. For increased accuracy, the raw data of the start and end times of the whistle were directly analyzed without being processed by the algorithm. The time of the J2 station was considered as the central time, and cross-correlation analysis with the full frequency band raw data of the central station and three other stations was performed to determine the time difference, as shown in Equations (8) and (9), where is J2 station’s whistle raw data; is the three other stations’ whistle raw data; is the result of cross-correlation; and is the time difference, which was used to obtain the location of the signal source by the TDOA method.
(8)
(9)
4. Results
4.1. Comparison with PAMGuard
As mentioned, PAMGuard software is widely used in the field of marine mammal observation. In this research, the performance of NTU_PAM and the Whistle and Moan Detector module of PAMGuard were compared using the same hardware (an i9-9900 CPU from Intel Corporation with 64 GB of memory). The test audio is a two-minute sound file, rich in whistles and with a sampling frequency of 96 kHz, recorded near the sea area of Yunlin, Taiwan [34]. We manually confirmed that the file contained a total of 33 whistles.
When the PAMGuard Whistle and Moan Detector’s parameters were set at a window length of 2048 data points (0.02 s) and 1024 data points (0.01 s), and when the overlap ratios were 50% and 90%, the NTU_PAM’s recommended window length was 0.01 s with an overlap ratio of 90% and SNR set to 6 dB. As shown in Table 2, PAMGuard with settings of window length at 1024 data points, 90% overlap ratio, and 6 dB SNR shows the closest result of the 47 detected whistles to the manually confirmed 33 whistles. A total of 30 whistles were detected by NTU_PAM.
4.2. Experimental Results
At least three signal receiving stations were used to calculate TDOA. When the intersection of the hyperbolic curves is plural, the center point is taken as the final judgment location. To verify localization accuracy, GPS data from the experimental ship bearing the sound source were compared to results from TDOA.
In the series of graphs in Figure 10, the blue dot is the hydrophone station position (J1, J2, and J4), the red dot is the signal source position of the experimental ship’s GPS record, and the yellow star is the TDOA positioning result. The results from the first experiment testing the rising frequency (5–9 kHz) signal are shown in Figure 10a. The positioning accuracy was higher when the sound source was nearer to the center positions J1 and J2 from the group of hydrophone stations. The nearest positioning points T4 to T11 showed an average positioning error of 24.7 m, and the overall positioning error was 143.5 m, which was affected by the lower accuracy of the outer point.
The second experiment was the decreasing frequency (9–5 kHz) signal, and its positioning trend was similar to the rising frequency signal (Figure 10b). It also showed higher positioning accuracy when the signal source was close to the J1 and J2 stations. The average positioning error of T4 to T11 was 44.8 m, larger than that of the rising frequency signal, and the overall positioning error was 145.9 m. Finally, the U-shaped (9–5–9 kHz) signal displayed a similar trend as the aforementioned signals (Figure 10c). The average positioning error of T4 to T11 was 39.6 m, but the overall positioning error was the smallest of the three signals at 116.1 m.
5. Discussion
In the comparison between PAMGuard and NTU_PAM, the results were close to the number of whistles that was manually confirmed and showed that both performed well on whistle detection. The reason for the different numbers detected may be that PAMGuard is a real-time auxiliary tool mainly provided to visual method researchers for detecting the occurrence of a call; as such, it only needs a few window lengths of data to detect the whistle. As to the amount of audio data required, NTU_PAM needs one second or more of data to build a spectrogram and to initiate processing. However, PAMGuard may, at times, break one call into several calls, as shown in Figure 11. According to the results, NTU_PAM is suitable for to processing measurements captured over a longer duration, and it proves as robust as PAMGuard.
In the localization experiment, the TDOA method proved useful for localizing the whistle source. Figure 12 plots the errors of the three different types of signals at each spot and indicates that the error is small when the source is inside the region of the four hydrophone recorders (points T4–T11); when outside the region (points T1–T3 and T12–T15), location was only approximate (Figure 13). The results of this experiment indicate strengths in using the NTU_PAM for successful tracking of cetaceans.
6. Conclusions
In this research, we devised and developed the NTU_PAM algorithm, which performs whistle detection and whistle localization based on the TDOA method. The results showed NTU_PAM is able to localize and track the whistle sound source with high accuracy. In the future, MMOs can monitor the moving path of marine mammals via the visual method combined with NTU_PAM, making it possible to monitor cetaceans without being limited by daylight hours.
Author Contributions
Conceptualization, C.-T.H., W.-L.L., Y.-H.H., W.-Y.C. and C.-F.C.; methodology, C.-T.H., W.-L.L., Y.-H.H., W.-Y.C. and C.-F.C.; software, C.-T.H., W.-L.L. and W.-Y.C.; formal analysis, W.-Y.C. and W.-C.H.; writing—original draft preparation, C.-T.H. and W.-Y.C.; writing—review and editing, C.-T.H. and C.-F.C.; supervision, C.-F.C.; project administration, C.-F.C.; funding acquisition, C.-F.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Ministry of Science and Technology, Taiwan (MOST 109-2221-E-002-198-MY3).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Request from the corresponding author of this article.
Acknowledgments
The authors would like to thank the Formosa Plastics Group and the Formosa Petrochemical Corporation, 1141946-00, for the 2 minutes of data. The authors would like to thank the Ministry of Science and Technology, Taiwan, for the funding; and Professor Lien-Siang Chou for help with marine mammal knowledge.
Conflicts of Interest
The authors declare no conflict of interest.
Short Biography of Authors
Ching-Tang Hung received his B.S. degree from Department of Hydraulic and Ocean Engineering, National Cheng Kung University and M.S. degree from Department of Engineering Science and Ocean Engineering, National Taiwan University. He is currently a Ph.D. candidate in Department of Engineering Science and Ocean Engineering, National Taiwan University. His research interests include underwater acoustic, whistle detection and automated unmanned surface vehicle system. | |
Yen-Hsiang Huang received his B.S. degree from the Department of Mechanical Engineering, National Chun Hsing University and M.S. degree from Department of Engineering Science and Ocean Engineering, National Taiwan University. He is currently working as firmware engineer in Taiwan. His research interest including underwater acoustics, USV system integration and strong skill in C/C++ programming. | |
Wei-Yen Chu received his B.S. degree from Department of Marine Engineering, National Taiwan Ocean University and M.S. degree from Department of Engineering Science and Ocean Engineering, National Taiwan University. He is currently working in the semiconductor manufacturing area in Taiwan. His research interests include underwater acoustics, signal processing, whistle detection, localization and simulation. | |
Wei-Chun Hu is a Ph.D. candidate at the Department of Engineering Sciences and Ocean Engineering, National Taiwan University, Taiwan. His research focuses specifically on the soundscape and the propagation of underwater noise. Recent participated publications can be found in Ecological Indicators and Entropy journal. | |
Wei-Lun Li received his B.S. degree from Department of Hydraulic and Ocean Engineering, National Cheng Kung University and M.S. degree from Department of Engineering Science and Ocean Engineering, National Taiwan University. He is currently working as research assistant, Institute of hydrobiology, Chinese Academy of science. His research interests include underwater acoustic and underwater detection of marine mammals. | |
Chi-Fang Chen received her Ph.D. in the Department of Ocean Engineering, Massachusetts Institute of Technology in 1991, and started her career as the faculty member of the Department of Naval Architecture of National Taiwan University from 1991 till now. (Department of Naval Architecture was renamed as Department of Engineering Science and Ocean Engineering in 2000). Her research expertise and interests are underwater acoustics and underwater acoustic propagation. She is 0conducting passive acoustic monitoring (PAM) in recognizing sounds from different species in the ocean which includes Sousa Chinensis in Taiwan waters. She also has interests in autonomous ocean sensing, and has supervised two master’s theses in AUV, and is now supervising five graduate students in autonomous surface vehicle study. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures and Tables
Figure 5. Ambient noise percentile level: Ln is the noise level exceeding n% of the measurement time, i.e., L50 is the noise level exceeding 50% of the measurement time.
Figure 9. Received signal of each station when the source was at point T10. (a) J1 station; (b) J2 station; (c) J3 station; (d) J4 station.
Figure 10. (a) Result of rising frequency signal; (b) result of decreasing frequency signal; (c) result of U-shaped signal.
Figure 11. Output results of rising frequency signals: (a) NTU_PAM; (b) PAMGuard with 1024 data points window length, 90% overlap ratio and 6 dB SNR.
Latitude and longitude of hydrophone stations.
Station | Latitude (N) | Longitude (E) | Depth (m) |
---|---|---|---|
J1 | 24.3305° | 120.4788° | 29.1 |
J2 | 24.3101° | 120.4861° | 28.7 |
J3 | 24.3305° | 120.5259° | 8.0 |
J4 | 24.2588° | 120.4851° | 11.0 |
Comparison of results.
Algorithm | Parameters | Detected Numbers |
---|---|---|
Manually confirmed | - | 33 |
PAMGuard | Window = 2048 |
79 |
PAMGuard | Window = 2048 |
50 |
PAMGuard | Window = 1024 |
91 |
PAMGuard | Window = 1024 |
47 |
NTU_PAM | Window = 0.01 s |
30 |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021 by the authors.
Abstract
In recent years, Taiwan’s government has focused on policies regarding offshore wind farming near the Indo-Pacific humpback dolphin habitat, where marine mammal observation is a critical consideration. The present research developed an algorithm called National Taiwan University Passive Acoustic Monitoring (NTU_PAM) to assist marine mammal observers (MMOs). The algorithm performs whistle detection processing and whistle localization. Whistle detection processing is based on image processing and whistle feature extraction; whistle localization is based on the time difference of arrival (TDOA) method. To test the whistle detection performance, we used the same data to compare NTU_PAM and the widely used software PAMGuard. To test whistle localization, we designed a real field experiment where a sound source projected simulated whistles, which were then recorded by several hydrophone stations. The data were analyzed to locate the moving path of the source. The results show that localization accuracy was higher when the sound source position was in the detection region composed of hydrophone stations. This paper provides a method for MMOs to conveniently observe the migration path and population dynamics of cetaceans without ecological disturbance.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer