There is a lot of awesome research and development happening out in the interpretability community that we would like to share. Here we will maintain a curated list of research, implementations and resources. We would love to learn about more! Please feel free to make a pull request to contribute to the list.
TorchRay focuses on attribution, namely the problem of determining which part of the input, usually an image, is responsible for the value computed by a neural network.
Score-CAM is a gradient-free visualization method extended from Grad-CAM and Grad-CAM++. It provides score-weighted visual explanations for CNNs.
White noise stimuli is fed to a classifier and the ones that are categorized into a particular class are averaged. It gives an estimate of the templates a classifier uses for classification, and is based on two popular and related methods in psychophysics and neurophysiology namely classification images and spike triggered analysis.
An attribution method that uses information at the end of each network scale which is then combined into a single saliency map.