The Baseline Illusion
The Integrated Gradients paper presents baselines as a path to "absolute" feature importance, but this is conceptually dishonest. Every local explanation is comparative. You're always asking "important relative to what?" In IG, the framework the authors provide asks for a point to be explained, and a reference point to compare the point to be explained with. As the authors of the paper write, “For most deep networks, it is possible to choose a baseline such that the prediction at the baseline is near zero (F(x') ≈ 0). (For instance, the black image baseline indeed satisfies this property.) In such cases, there is an interpretation to the resulting attributions: the gradients of the baseline add amounts to distributing the output value to the individual input features."
This may sound reasonable in theory, but breaks down in practice. First, determining what the correct baseline is for a given model is not trivial, and often not possible. One can imagine many image classifiers for which an all black image would not, in fact, be a no-signal input. Similarly, what would the "zero information" state for a fraud detection model? An average input, for example, carries a strong non-risky signal.
And more fundamentally, using this no-signal baseline aims to obfuscate the comparative nature of IG. Why do this? Why pretend that one can generate absolute feature importances for a local explanation? By hiding the comparative nature behind the notion of a baseline, IG obscures what it's actually doing while claiming to reveal fundamental truths.
The novel explainability method embraces the comparative nature of explanations, making it explicit by showing you exactly which real examples we're comparing against.