Skip to content
This repository was archived by the owner on Apr 17, 2026. It is now read-only.
This repository was archived by the owner on Apr 17, 2026. It is now read-only.

Questions from figure 1 #7

@josejhlee

Description

@josejhlee

Hello authors,

I'm having a hard time understanding figure 1. Is every count/point in the histogram corresponding to the strongest Causal Tracing effect over all hidden states (token and layer index)?

And if so, I would like to explain how I'm understanding figure 1 from your paper and ROME's figure 2, both showing Causal Tracing. Firstly, it is bothering that ROME's figure 2 doesn't show evidence of the peak Causal Tracing effect outside of the early-mid layer from your figure and instead shows the dominance of the early site tracing effect. I am hypothesizing that their figure looks that way because the peak causal tracing is distributed across layers and irregularly activated across samples while the early site tracing effect shown in ROME is focused on the same region + regularly activated across samples. Thus, the peak causal tracing from your figure cancels out in their figure and the early site tracing effect shows prominent aggregation.

It would be very helpful if you checked my understanding of your figure.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions