Three-dimensional sound localisation with a lizard peripheral auditory model

Publikation: Konferencebidrag uden forlag/tidsskriftPosterForskningpeer review

Resumé

Conventional approaches for three-dimensional sound source localisation utilise either interaural time difference information extracted via static two-dimensional multi-microphone grids [Imran et al. 2016] with at least four microphones, or spectral cues [Keyrouz 2014; Reddy et al. 2016] via head-related transfer functions [Cheng and Wakefield 2001]. Here we present a preliminary sensorimotor approach [Aytekin et al. 2007; Shaikh 2012] in simulation to three-dimensional sound source localisation employing two simulated microphones. We use directed spatial movements of the microphones to resolve the unknown location of an acoustic target in three dimensions. Our approach utilises a model of the peripheral auditory system of lizards [Christensen-Dalsgaard and Manley 2005] coupled with a multi-layer perceptron neural network. The peripheral auditory model’s response to sound input encodes sound direction information in a single plane which by itself is insufficient to localise the acoustic target in three dimensions. A multi-layer perceptron neural network is used to combine two independent responses of the model, corresponding to two rotational movements, into an estimate of the sound direction in terms of its relative azimuth and elevation. We employed an acoustic target that emitted a sound frequency of 1650 Hz, chosen so as to elicit the strongest response from the lizard peripheral auditory model. To resolve the unknown azimuth and elevation of the acoustic target, two independent acoustic measurements were performed, first after the microphones are rotated to -45 deg. and then after the microphones are rotated to +45 deg. along the sagittal axis. The auditory model thus generated two independent responses corresponding to these two measurements. The two measurements were repeated for varying locations of the acoustic target, with a 1 deg. resolution in both azimuth and elevation, on the surface of a frontal spherical section in space defined by an azimuth range of [-90 deg., +90 deg.] and an elevation range of [-60 deg., +60 deg.]. Two individual representations of sound location that non-linearly mapped the model’s response to sound direction were thus generated in this manner, one for each microphone rotation. Labelled training data, comprising two-dimensional vectors formed by taking one sample of the model’s response from each mapping labelled with the corresponding azimuth and elevation, was generated from these mappings. Two independent multi-layer perceptron neural networks with respectively one and two hidden layers were trained on this training data via supervised learning. The multi-layer perceptron computed a weighted non-linear superpositioning of these two mappings. After training the networks learned a transfer function that translated the three-dimensional non-linear mapping into estimated azimuth and elevation values for the acoustic target. The neural network with two hidden layers as expected performed better than that with only one hidden layer. Our approach assumes that for any given target location, sound signal is available and that the target is stationary, for both movements. Acoustic and sensor noise as well as multi-frequency signals such as speech are also not considered. These assumptions will be removed in future work and challenges in robotic implementations, such as real-time operation, will be addressed.
OriginalsprogEngelsk
Publikationsdato2017
Antal sider1
StatusUdgivet - 2017
Begivenhed2017 ACM Symposium on Applied Perception - Brandenburg University of Technology, Cottbus, Tyskland
Varighed: 16. sep. 201717. sep. 2017
http://sap.acm.org/2017/home.php

Konference

Konference2017 ACM Symposium on Applied Perception
LokationBrandenburg University of Technology
LandTyskland
ByCottbus
Periode16/09/201717/09/2017
Internetadresse

Fingeraftryk

lizards
sound localization
acoustics
microphones
azimuth
self organizing systems
education
transfer functions
real time operation
acoustic frequencies
acoustic measurement

Citer dette

Kjær Schmidt, M., & Shaikh, D. (2017). Three-dimensional sound localisation with a lizard peripheral auditory model. Poster session præsenteret på 2017 ACM Symposium on Applied Perception, Cottbus, Tyskland.
Kjær Schmidt, Michael ; Shaikh, Danish. / Three-dimensional sound localisation with a lizard peripheral auditory model. Poster session præsenteret på 2017 ACM Symposium on Applied Perception, Cottbus, Tyskland.1 s.
@conference{9c1d44c28e0d45bdbe4a075b24d666c9,
title = "Three-dimensional sound localisation with a lizard peripheral auditory model",
abstract = "Conventional approaches for three-dimensional sound source localisation utilise either interaural time difference information extracted via static two-dimensional multi-microphone grids [Imran et al. 2016] with at least four microphones, or spectral cues [Keyrouz 2014; Reddy et al. 2016] via head-related transfer functions [Cheng and Wakefield 2001]. Here we present a preliminary sensorimotor approach [Aytekin et al. 2007; Shaikh 2012] in simulation to three-dimensional sound source localisation employing two simulated microphones. We use directed spatial movements of the microphones to resolve the unknown location of an acoustic target in three dimensions. Our approach utilises a model of the peripheral auditory system of lizards [Christensen-Dalsgaard and Manley 2005] coupled with a multi-layer perceptron neural network. The peripheral auditory model’s response to sound input encodes sound direction information in a single plane which by itself is insufficient to localise the acoustic target in three dimensions. A multi-layer perceptron neural network is used to combine two independent responses of the model, corresponding to two rotational movements, into an estimate of the sound direction in terms of its relative azimuth and elevation. We employed an acoustic target that emitted a sound frequency of 1650 Hz, chosen so as to elicit the strongest response from the lizard peripheral auditory model. To resolve the unknown azimuth and elevation of the acoustic target, two independent acoustic measurements were performed, first after the microphones are rotated to -45 deg. and then after the microphones are rotated to +45 deg. along the sagittal axis. The auditory model thus generated two independent responses corresponding to these two measurements. The two measurements were repeated for varying locations of the acoustic target, with a 1 deg. resolution in both azimuth and elevation, on the surface of a frontal spherical section in space defined by an azimuth range of [-90 deg., +90 deg.] and an elevation range of [-60 deg., +60 deg.]. Two individual representations of sound location that non-linearly mapped the model’s response to sound direction were thus generated in this manner, one for each microphone rotation. Labelled training data, comprising two-dimensional vectors formed by taking one sample of the model’s response from each mapping labelled with the corresponding azimuth and elevation, was generated from these mappings. Two independent multi-layer perceptron neural networks with respectively one and two hidden layers were trained on this training data via supervised learning. The multi-layer perceptron computed a weighted non-linear superpositioning of these two mappings. After training the networks learned a transfer function that translated the three-dimensional non-linear mapping into estimated azimuth and elevation values for the acoustic target. The neural network with two hidden layers as expected performed better than that with only one hidden layer. Our approach assumes that for any given target location, sound signal is available and that the target is stationary, for both movements. Acoustic and sensor noise as well as multi-frequency signals such as speech are also not considered. These assumptions will be removed in future work and challenges in robotic implementations, such as real-time operation, will be addressed.",
author = "{Kj{\ae}r Schmidt}, Michael and Danish Shaikh",
year = "2017",
language = "English",
note = "2017 ACM Symposium on Applied Perception, ACM SAP 2017 ; Conference date: 16-09-2017 Through 17-09-2017",
url = "http://sap.acm.org/2017/home.php",

}

Kjær Schmidt, M & Shaikh, D 2017, 'Three-dimensional sound localisation with a lizard peripheral auditory model' 2017 ACM Symposium on Applied Perception, Cottbus, Tyskland, 16/09/2017 - 17/09/2017, .

Three-dimensional sound localisation with a lizard peripheral auditory model. / Kjær Schmidt, Michael; Shaikh, Danish.

2017. Poster session præsenteret på 2017 ACM Symposium on Applied Perception, Cottbus, Tyskland.

Publikation: Konferencebidrag uden forlag/tidsskriftPosterForskningpeer review

TY - CONF

T1 - Three-dimensional sound localisation with a lizard peripheral auditory model

AU - Kjær Schmidt, Michael

AU - Shaikh, Danish

PY - 2017

Y1 - 2017

N2 - Conventional approaches for three-dimensional sound source localisation utilise either interaural time difference information extracted via static two-dimensional multi-microphone grids [Imran et al. 2016] with at least four microphones, or spectral cues [Keyrouz 2014; Reddy et al. 2016] via head-related transfer functions [Cheng and Wakefield 2001]. Here we present a preliminary sensorimotor approach [Aytekin et al. 2007; Shaikh 2012] in simulation to three-dimensional sound source localisation employing two simulated microphones. We use directed spatial movements of the microphones to resolve the unknown location of an acoustic target in three dimensions. Our approach utilises a model of the peripheral auditory system of lizards [Christensen-Dalsgaard and Manley 2005] coupled with a multi-layer perceptron neural network. The peripheral auditory model’s response to sound input encodes sound direction information in a single plane which by itself is insufficient to localise the acoustic target in three dimensions. A multi-layer perceptron neural network is used to combine two independent responses of the model, corresponding to two rotational movements, into an estimate of the sound direction in terms of its relative azimuth and elevation. We employed an acoustic target that emitted a sound frequency of 1650 Hz, chosen so as to elicit the strongest response from the lizard peripheral auditory model. To resolve the unknown azimuth and elevation of the acoustic target, two independent acoustic measurements were performed, first after the microphones are rotated to -45 deg. and then after the microphones are rotated to +45 deg. along the sagittal axis. The auditory model thus generated two independent responses corresponding to these two measurements. The two measurements were repeated for varying locations of the acoustic target, with a 1 deg. resolution in both azimuth and elevation, on the surface of a frontal spherical section in space defined by an azimuth range of [-90 deg., +90 deg.] and an elevation range of [-60 deg., +60 deg.]. Two individual representations of sound location that non-linearly mapped the model’s response to sound direction were thus generated in this manner, one for each microphone rotation. Labelled training data, comprising two-dimensional vectors formed by taking one sample of the model’s response from each mapping labelled with the corresponding azimuth and elevation, was generated from these mappings. Two independent multi-layer perceptron neural networks with respectively one and two hidden layers were trained on this training data via supervised learning. The multi-layer perceptron computed a weighted non-linear superpositioning of these two mappings. After training the networks learned a transfer function that translated the three-dimensional non-linear mapping into estimated azimuth and elevation values for the acoustic target. The neural network with two hidden layers as expected performed better than that with only one hidden layer. Our approach assumes that for any given target location, sound signal is available and that the target is stationary, for both movements. Acoustic and sensor noise as well as multi-frequency signals such as speech are also not considered. These assumptions will be removed in future work and challenges in robotic implementations, such as real-time operation, will be addressed.

AB - Conventional approaches for three-dimensional sound source localisation utilise either interaural time difference information extracted via static two-dimensional multi-microphone grids [Imran et al. 2016] with at least four microphones, or spectral cues [Keyrouz 2014; Reddy et al. 2016] via head-related transfer functions [Cheng and Wakefield 2001]. Here we present a preliminary sensorimotor approach [Aytekin et al. 2007; Shaikh 2012] in simulation to three-dimensional sound source localisation employing two simulated microphones. We use directed spatial movements of the microphones to resolve the unknown location of an acoustic target in three dimensions. Our approach utilises a model of the peripheral auditory system of lizards [Christensen-Dalsgaard and Manley 2005] coupled with a multi-layer perceptron neural network. The peripheral auditory model’s response to sound input encodes sound direction information in a single plane which by itself is insufficient to localise the acoustic target in three dimensions. A multi-layer perceptron neural network is used to combine two independent responses of the model, corresponding to two rotational movements, into an estimate of the sound direction in terms of its relative azimuth and elevation. We employed an acoustic target that emitted a sound frequency of 1650 Hz, chosen so as to elicit the strongest response from the lizard peripheral auditory model. To resolve the unknown azimuth and elevation of the acoustic target, two independent acoustic measurements were performed, first after the microphones are rotated to -45 deg. and then after the microphones are rotated to +45 deg. along the sagittal axis. The auditory model thus generated two independent responses corresponding to these two measurements. The two measurements were repeated for varying locations of the acoustic target, with a 1 deg. resolution in both azimuth and elevation, on the surface of a frontal spherical section in space defined by an azimuth range of [-90 deg., +90 deg.] and an elevation range of [-60 deg., +60 deg.]. Two individual representations of sound location that non-linearly mapped the model’s response to sound direction were thus generated in this manner, one for each microphone rotation. Labelled training data, comprising two-dimensional vectors formed by taking one sample of the model’s response from each mapping labelled with the corresponding azimuth and elevation, was generated from these mappings. Two independent multi-layer perceptron neural networks with respectively one and two hidden layers were trained on this training data via supervised learning. The multi-layer perceptron computed a weighted non-linear superpositioning of these two mappings. After training the networks learned a transfer function that translated the three-dimensional non-linear mapping into estimated azimuth and elevation values for the acoustic target. The neural network with two hidden layers as expected performed better than that with only one hidden layer. Our approach assumes that for any given target location, sound signal is available and that the target is stationary, for both movements. Acoustic and sensor noise as well as multi-frequency signals such as speech are also not considered. These assumptions will be removed in future work and challenges in robotic implementations, such as real-time operation, will be addressed.

M3 - Poster

ER -

Kjær Schmidt M, Shaikh D. Three-dimensional sound localisation with a lizard peripheral auditory model. 2017. Poster session præsenteret på 2017 ACM Symposium on Applied Perception, Cottbus, Tyskland.