You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 17, 2026. It is now read-only.
I'm having a hard time understanding figure 1. Is every count/point in the histogram corresponding to the strongest Causal Tracing effect over all hidden states (token and layer index)?
And if so, I would like to explain how I'm understanding figure 1 from your paper and ROME's figure 2, both showing Causal Tracing. Firstly, it is bothering that ROME's figure 2 doesn't show evidence of the peak Causal Tracing effect outside of the early-mid layer from your figure and instead shows the dominance of the early site tracing effect. I am hypothesizing that their figure looks that way because the peak causal tracing is distributed across layers and irregularly activated across samples while the early site tracing effect shown in ROME is focused on the same region + regularly activated across samples. Thus, the peak causal tracing from your figure cancels out in their figure and the early site tracing effect shows prominent aggregation.
It would be very helpful if you checked my understanding of your figure.
Thanks!
Hello authors,
I'm having a hard time understanding figure 1. Is every count/point in the histogram corresponding to the strongest Causal Tracing effect over all hidden states (token and layer index)?
And if so, I would like to explain how I'm understanding figure 1 from your paper and ROME's figure 2, both showing Causal Tracing. Firstly, it is bothering that ROME's figure 2 doesn't show evidence of the peak Causal Tracing effect outside of the early-mid layer from your figure and instead shows the dominance of the early site tracing effect. I am hypothesizing that their figure looks that way because the peak causal tracing is distributed across layers and irregularly activated across samples while the early site tracing effect shown in ROME is focused on the same region + regularly activated across samples. Thus, the peak causal tracing from your figure cancels out in their figure and the early site tracing effect shows prominent aggregation.
It would be very helpful if you checked my understanding of your figure.
Thanks!