Gradient | Computes the gradient of the output of the **target neuron** with respect to the input. | The **simplest** approach but is usually not the most effective. |

DeConvNet [20] | Applies the **ReLU to the gradient computation instead** of the gradient of a neuron with ReLU activation. | Used to **visualize the features** learned by the layers. **Limited** to CNN models with **ReLU activation**. |

Saliency Maps [23] | Takes the **absolute value of the partial derivative** of the target output neuron with respect to the input features to find the features which affect the output the most with least perturbation. | **Can’t distinguish between positive and negative** evidence due to absolute values. |

Guided backpropagation (GBP) [24] | Applies the **ReLU to the gradient computation in addition** to the gradient of a neuron with ReLU activation. | Like DeConvNet, it is textbflimited to CNN models with **ReLU activation**. |

LRP [25] | **Redistributes the prediction score** layer by layer with a backward pass on the network using a particular rule like the $\u03f5$**-rule** while ensuring numerical stability | There are alternative stability rules and **limited** to CNN models with **ReLU activation** when all activations are **ReLU**. |

Gradient × input [26] | Initially proposed as a method to **improve sharpness of attribution maps** and is computed by multiplying the signed partial derivative of the output with the input. | It **can approximate occlusion** better than other methods in certain cases like multi layer perceptron (MLP) with Tanh on MNIST data [18] while being instant to compute. |

GradCAM [27] | Produces **gradient-weighted class activation maps** using the gradients of the target concept as it flows to the final convolutional layer | Applicable to **only CNN** including those with fully connected layers, structured output (like captions) and reinforcement learning. |

IG [28] | Computes the **average gradient** as the input is varied from the **baseline** (often zero) to the actual input value unlike the Gradient × input which uses a single derivative at the input. | It is **highly correlated with the rescale rule of DeepLIFT** discussed below which can act as a good and faster approximation. |

DeepTaylor [17] | Finds a rootpoint near each neuron with a value close to the input but with output as 0 and uses it to recursively estimate the attribution of each neuron using **Taylor decomposition** | Provides **sparser explanations**, i.e., focuses on key features but provides **no negative evidence** due to its assumptions of only positive effect. |

PatternNet [29] | Estimates the input signal of the output neuron using an **objective function**. | Proposed to counter the incorrect attributions of other methods on **linear systems** and generalized to deep networks. |

Pattern Attribution [29] | Applies Deep Taylor decomposition by searching the **rootpoints in the signal direction** for each neuron | Proposed along with **PatternNet** and uses decomposition instead of signal visualization |

DeepLIFT [30] | Uses a reference input and computes the reference values of all hidden units using a forward pass and then proceeds backward **like LRP**. It has two variants—**Rescale rule** and the one introduced later called **RevealCancel** which treats positive and negative contributions to a neuron separately. | Rescale is strongly related to and **equivalent in some cases to** $\u03f5$**-LRP** but is **not applicable to models involving multiplicative rules**. **RevealCancel handles such cases** and using RevealCancel for convolutional and Rescale for fully connected layers reduces noise. |

SmoothGrad [31] | An improvement on the gradient method which averages the gradient over multiple inputs with additional noise | Designed to visually sharpen the attributions produced by gradient method using class score function. |

Deep SHAP [32] | It is a fast **approximation** algorithm to compute the game theory based **SHAP values**. It is connected to DeepLIFT and uses **multiple background samples** instead of one baseline. | Finds attributions for **non neural net models** like trees, support vector machines (SVM) and **ensemble** of those with a neural net using various tools in the the SHAP library. |