Hyung Il Koo
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
Improving Explainability of Integrated Gradients with Guided Non-Linearity
Hyuk Jin Kwon, Hyung Il Koo, Nam Ik Cho
Auto-TLDR; Guided Non-linearity for Attribution in Convolutional Neural Networks
Abstract Slides Poster Similar
Along with the performance improvements of neural network models, developing methods that enable the explanation of their behavior is a significant research topic. For convolutional neural networks, the explainability is usually achieved with attribution (heatmap) that visualizes pixel-level importance or contribution of input to its corresponding result. This attribution should reflect the relation (dependency) between inputs and outputs, which has been studied with a variety of methods, e.g., derivative of an output with respect to an input pixel value, a weighted sum of gradients, amount of output changes to input perturbations, and so on. In this paper, we present a new method that improves the measure of attribution, and incorporates it into the integrated gradients method. To be precise, rather than using the conventional chain-rule, we propose a method called guided non-linearity that propagates gradients more effectively through non-linear units (e.g., ReLU and max-pool) so that only positive gradients backpropagate through non-linear units. Our method is inspired by the mechanism of action potential generation in postsynaptic neurons, where the firing of action potentials depends on the sum of excitatory (EPSP) and inhibitory postsynaptic potentials (IPSP). We believe that paths consisting of EPSP-giving-neurons faithfully reflect the contribution of inputs to the output, and we make gradients flow only along those paths (i.e., paths of positive chain reactions). Experiments with 5 deep neural networks have shown that the proposed method outperforms others in terms of the deletion metrics, and yields fine-grained and more human-interpretable attribution.