Questions from figure 1

Hello authors,

I'm having a hard time understanding figure 1. Is every count/point in the histogram corresponding to the strongest Causal Tracing effect over all hidden states (token and layer index)?

And if so, I would like to explain how I'm understanding figure 1 from your paper and ROME's figure 2, both showing Causal Tracing. Firstly, it is bothering that ROME's figure 2 doesn't show evidence of the peak Causal Tracing effect outside of the early-mid layer from your figure and instead shows the dominance of the early site tracing effect. I am hypothesizing that their figure looks that way because the peak causal tracing is distributed across layers and irregularly activated across samples while the early site tracing effect shown in ROME is focused on the same region + regularly activated across samples. Thus, the peak causal tracing from your figure cancels out in their figure and the early site tracing effect shows prominent aggregation.

It would be very helpful if you checked my understanding of your figure.
Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions from figure 1 #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions from figure 1 #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions