Conventional approaches for three-dimensional sound source localisation utilise either interaural time difference information extracted via static two-dimensional multi-microphone grids [Imran et al. 2016] with at least four microphones, or spectral cues [Keyrouz 2014; Reddy et al. 2016] via head-related transfer functions [Cheng and Wakefield 2001]. Here we present a preliminary sensorimotor approach [Aytekin et al. 2007; Shaikh 2012] in simulation to three-dimensional sound source localisation employing two simulated microphones. We use directed spatial movements of the microphones to resolve the unknown location of an acoustic target in three dimensions. Our approach utilises a model of the peripheral auditory system of lizards [Christensen-Dalsgaard and Manley 2005] coupled with a multi-layer perceptron neural network. The peripheral auditory model’s response to sound input encodes sound direction information in a single plane which by itself is insufficient to localise the acoustic target in three dimensions. A multi-layer perceptron neural network is used to combine two independent responses of the model, corresponding to two rotational movements, into an estimate of the sound direction in terms of its relative azimuth and elevation. We employed an acoustic target that emitted a sound frequency of 1650 Hz, chosen so as to elicit the strongest response from the lizard peripheral auditory model. To resolve the unknown azimuth and elevation of the acoustic target, two independent acoustic measurements were performed, first after the microphones are rotated to -45 deg. and then after the microphones are rotated to +45 deg. along the sagittal axis. The auditory model thus generated two independent responses corresponding to these two measurements. The two measurements were repeated for varying locations of the acoustic target, with a 1 deg. resolution in both azimuth and elevation, on the surface of a frontal spherical section in space defined by an azimuth range of [-90 deg., +90 deg.] and an elevation range of [-60 deg., +60 deg.]. Two individual representations of sound location that non-linearly mapped the model’s response to sound direction were thus generated in this manner, one for each microphone rotation. Labelled training data, comprising two-dimensional vectors formed by taking one sample of the model’s response from each mapping labelled with the corresponding azimuth and elevation, was generated from these mappings. Two independent multi-layer perceptron neural networks with respectively one and two hidden layers were trained on this training data via supervised learning. The multi-layer perceptron computed a weighted non-linear superpositioning of these two mappings. After training the networks learned a transfer function that translated the three-dimensional non-linear mapping into estimated azimuth and elevation values for the acoustic target. The neural network with two hidden layers as expected performed better than that with only one hidden layer. Our approach assumes that for any given target location, sound signal is available and that the target is stationary, for both movements. Acoustic and sensor noise as well as multi-frequency signals such as speech are also not considered. These assumptions will be removed in future work and challenges in robotic implementations, such as real-time operation, will be addressed.
|Status||Udgivet - 2017|
|Begivenhed||2017 ACM Symposium on Applied Perception - Brandenburg University of Technology, Cottbus, Tyskland|
Varighed: 16. sep. 2017 → 17. sep. 2017
|Konference||2017 ACM Symposium on Applied Perception|
|Lokation||Brandenburg University of Technology|
|Periode||16/09/2017 → 17/09/2017|