Takahiro Shinozaki
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
Unsupervised Sound Source Localization From Audio-Image Pairs Using Input Gradient Map
Tomohiro Tanaka, Takahiro Shinozaki
Auto-TLDR; Unsupervised Sound Localization Using Gradient Method
Abstract Slides Poster Similar
Humans easily and routinely identify an image region that corresponds to an observed sound in their daily lives. The task is formulated as an unsupervised sound source localization without using tagged data. Recently, several methods have been proposed that utilize the activation of hidden or output layers of neural networks, such as an attention layer or feature maps in a convolutional neural network (CNN). We propose another strategy that obtains a localization map at the input side, applying the widely used input gradient method. It is computationally efficient and can be easily applied to any existing techniques because it is free from the network structure. Taking advantage of it, we propose a combination method with existing methods for higher sound localization performance. Experiments are performed using the Flickr-SoundNet data set. When a pre-trained image front-end was used, the proposed method gives better results than the attention-based method. For a completely unsupervised condition, the gradient method provides comparable performance as the conventional methods; the best results are obtained by this combination method.