Localisation of 3D Audio Sources in Augmented Reality

As human beings, we are dependent on our ability to navigate by 3d audio since it provides us with many clues about how we are to navigate and behave in our surroundings. The fact that we from birth have been equipped with two ears placed on each side of our head makes us able to perceive the azimuth of a given sound, in fact we are able to localize a sound source within 2 degrees of azimuth; the design of the pinna or outer ear and our torso provides us with the ability to perceive the elevation of a given sound.

During the past decade there has been an increase in interest within 3D sound or spatial audio, both within entertainment, industry, and research; within this period several methods and systems has been developed to reproduce spatial audio. One of the methods is called head-related transfer functions (HRTF), which uses several audio cues in order to provide the listener with a broad spatial soundscape.

Several neuropsychological research projects in human ability to localise spatial audio sources have been conducted, these projects have mostly been conducted on a very low cognitive level. Experimental psychology has been criticised for conducting research using beeps and flashes. The aim for this project is to investigate visual and auditory cues that trigger higher cognitive processes. Serving this purpose, an augmented reality environment has been created with a 15” LCD screen and a spatial audio rendering software model.

The working hypotheses question whether semantic cues may slow down localisation of sound sources incongruent with visual cues. The auditory and visual incongruence is within temporal, semantic, and locus domains.

Test subjects where to perform speeded localisation tasks in four similar conditions where the audible and visual content either coincide spatially and temporally or differ spatially and temporally. The four conditions where presented twice, however the difference was that the content was either concrete (everyday objects) or abstract.

Six 2×2 within-subject factorial ANOVAs have been designed to analyse the data collected from the tests, where the subjects where presented with the eight different conditions.

Reflecting upon the results we can say that the higher cognitive processes, when viewing concrete ob jects with incongruent temporal cues confuses the user in deciding the localisation; whereas the equivalent condition, using abstract objects, leads the subject to focus on the actual location of the audio source.

Sounds from a concrete visual representation, located coherently is easier to locate than sounds from abstract visual representations because of cognitive congruence.

(7th Semester)

Leave a Reply

Your email address will not be published. Required fields are marked *