How do two deep neural networks differ in how they arrive at a decision? Measuring the similarity of deep networks has been a long-standing open question. Most existing methods provide a single number to measure the similarity of two networks at a given layer, but give no insight into what makes them similar or dissimilar. We introduce an interpretable representational similarity method (RSVC) to compare two networks. We use RSVC to discover shared and unique visual concepts between two models. We show that some aspects of model differences can be attributed to unique concepts discovered by one model that are not well represented in the other. Finally, we conduct extensive evaluation across different vision model architectures and training protocols to demonstrate its effectiveness.
Concept based explanation methods provide insight into model behavior by revealing visual concepts the
model has discovered during training. Consider two different models trained on the same dataset,
we would like to understand how concepts differ between two models and if these concepts can
explain performance differences.
RSVC tackles this question by (1) extracting concepts for
Model 1, (2) asking Model 2 to predict Model 1's concepts,
and (3) measuring the quality of the prediction.
In this example, we use non-negative matrix factorization to
extract concepts for Model 1.
We find that Model 1 has discovered a concept for a bluejay's tail and a sky background.
We collect activations from Model 2, over the same images and
fit a regression model to
the concept coefficients of Model 1. We measure how the
predicted coefficients differ from the original coefficients using Pearson correlation. Larger
correlation values indicate that Model 2 shares the concept with Model 1. In this case, Model 2
does not strongly predict the bluejay tail concept, but shares the sky background concept.
Can RSVC recover known conceptual differences? We train Model 1 to associate a pink
square with the Common Eider class and Model 2 to be invariant to the pink square. If RSVC
works as expected, then it should discover that the pink square concept is unique to Model 1.
We show that RSVC can detect this known difference. In the green box, we visualize an image collage
containing images that have the largest coefficients for the extracted concept.
We see that the common feature in all of the image patches is the
pink square, so we determine it to be the pink square concept that we trained on.
We see that the predicted coefficients (from Model 2) are very different from Model 1's
coefficients for the pink square concept. We also visualize image collages that contain image patches
with over-predicted coefficients and under-predicted coefficients. We find that the regression model
is unable to disentangle images of water without the pink square from images of water with the
pink square.
Interpreting Low-Similarity Concepts In this example, we find a RN50 concept for the barbell class that the ViT-S is not able to predict. (Green): The RN50 concept reacts to images of hands lifting barbells. Additionally, many images contain vertical supports for a squat rack. We train a regression model on the ViT-S activations to predict the RN50 concept coefficients. (Blue): The ViT-S regression model under-reacts to images containing hands, people, and squat racks. (Orange): It over-reacts to images that have a greater focus on weight plates. These results suggest that the the specific concept of hands lifting barbells is not represented in the ViT-S. In the paper, we use an LLVM to analyze the image collages (IC1 and IC2) and find that it detects similar differences in the visualizations.
Do models learn important and unique concepts? Models are trained on ImageNet. We use ResNets and ViTs. We find that model's learn concepts that have (1) low similarity and low importance (2) high similarity and high importance and (3) low similarity and high importance. The last category of concepts are very interesting, since it indicates that one model has discovered an important concept that the other model has not learned. However, we find that the bulk of model differences can be attributed to medium similarity, medium importance concepts.
Layerwise Concept Similarity. We ask how concepts across different layers of two networks relate. We find that concept similarity is higher in earlier layers, decreases in the middle and spikes slightly towards the end of the network, suggesting that the classification task biases model's to organize information in a more similar way towards the end of the network. Interestingly, various aspects of this result have been confirmed in related work in representational similarity and interpretability.
@inproceedings{kondapaneni2025representational,
title={Representational Similarity via Interpretable Visual Concepts},
author={Kondapaneni, Neehar and Mac Aodha, Oisin and Perona, Pietro},
journal={The Thirteenth International Conference on Learning Representations},
year={2025}
}