Skip to content

cfsctl: Add cmd to get object ref for a file#236

Merged
cgwalters merged 2 commits intocomposefs:mainfrom
Johan-Liebert1:cmd-get-objref
Mar 11, 2026
Merged

cfsctl: Add cmd to get object ref for a file#236
cgwalters merged 2 commits intocomposefs:mainfrom
Johan-Liebert1:cmd-get-objref

Conversation

@Johan-Liebert1
Copy link
Collaborator

Add a command GetFileObjectRef that accepts an absolute file path and returns the ref in objects directory corresponding to that file

@cgwalters cgwalters closed this Feb 23, 2026
@cgwalters cgwalters reopened this Feb 23, 2026
@cgwalters
Copy link
Collaborator

Add a command GetFileObjectRef that accepts an absolute file path and returns the ref in objects directory corresponding to that file

Also: this commit message only says "what" not why. The "why" could be very short, something like "I wanted this to access the underlying file for a kernel binary so I could..."

@Johan-Liebert1 Johan-Liebert1 force-pushed the cmd-get-objref branch 2 times, most recently from 1eaca87 to 588570f Compare February 24, 2026 11:18
@Johan-Liebert1
Copy link
Collaborator Author

Also: this commit message only says "what" not why. The "why" could be very short, something like "I wanted this to access the underlying file for a kernel binary so I could..."

My bad, should've added context. Added some in the new commit message

@Johan-Liebert1
Copy link
Collaborator Author

Rebased and fixed conflicts

#[derive(Debug, Serialize)]
pub struct ObjectRef {
/// Whether the file is stored inline in the EROFS image
pub stored_inline: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem did you hit in making this an enum? serde supports that https://serde.rs/enum-representations.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it as

enum ObjectRef {
    Internal,
    External { digest: String, path: String }
}

serde serialised Internal as just "Internal".

I didn't know about the untagged trait. Do you think an Enum would be better here?

where
S: serde::Serializer,
{
s.serialize_str(v.to_string_lossy().as_ref())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not usually a fan of to_string_lossy except for human output. The problem is it just means the data isn't usable for machines in the general case.


/// Content types for leaf nodes (non-directory files).
#[derive(Debug)]
#[derive(Debug, Clone, Serialize)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I asked for this but let me argue to undo it with a different suggestion.

},
/// Given a file path, get the file corresponding to it in the object store
/// If the file if stored inline, returns an empty string
StatFile {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I thought about this more and I have a suggestion that solves the "non-UTF8 filename" problem and would align better with the "LayerInspect" verb: let's have this output a single line in composefs dump file format.

That includes all relevant metadata (backing file path, file metadata) in a "standard" format.

We could then call it dump-single-file perhaps? Or since we're here we could easily have it support multiple files (why not?) so how about

dump-files <image> [files]

And I'd also suggest we add a --with-parents so the output is a truly valid dump file that could be used to make new EROFS files in theory if desired.

And we can just drop the enum serialization stuff - we already have a textual serialization of a composefs filesystem tree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe --with-parents is on by default...it's only a bit more verbose)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a textual serialization of a composefs filesystem tree

I was not aware of this? Where do we have it?

OK I thought about this more and I have a suggestion that solves the "non-UTF8 filename" problem and would align better with the "LayerInspect" verb: let's have this output a single line in composefs dump file format.

that would be great, but that's not quite the intent here. The intent is more so just getting the corresponding file in the objects directory. Maybe I'm misunderstanding this but your suggestion is to output something like the following?

/usr/share/apk/keys/alpine-devel@lists.alpinelinux.org-66ba20fe.rsa.pub 800 100644 1 0 0 0 1758215632.0 1b/60ad29665d7a6c58e5dfc74d8bbfab074543c866820e5d3e4368054b62a263aba2bc157624305f2c8402
c446a6921d9e84d3d0409fd9731a9503a9982e872c - 1b60ad29665d7a6c58e5dfc74d8bbfab074543c866820e5d3e4368054b62a263aba2bc157624305f2c8402c446a6921d9e84d3d0409fd9731a9503a9982e872c

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about dump-files --backing-path which outputs just the path(s) newline separated and quoted in composefs dumpfile format?

But the default would be full dumpfile metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new command that operates on the EROFS and outputs in a dumpfile format

This is what the output looks like

$ ./cfsctl --repo /tmp/.tmpqzyqeM dump-files refs/my-image / --recursive

/ 0 40755 4 1000 1000 0 1773135856.0 - - -
/etc 0 40755 2 1000 1000 0 1773135856.0 - - -
/usr 0 40755 4 1000 1000 0 1773135856.0 - - -
/usr/bin 0 40755 2 1000 1000 0 1773135856.0 - - -
/usr/lib 0 40755 2 1000 1000 0 1773135856.0 - - -
/etc/hostname 17 100644 1 1000 1000 0 1773135856.0 - integration-test\x0a -
/usr/bin/hello 131072 100644 1 1000 1000 0 1773135856.0 18/c6776e8d37d2076bd66e7160f2c3cab90ce064dc8cdb1d10954c2c7007fd1e83963da6c78fcfb30783c1a37ff36d7255718324e64f414acfd5cefc1ade05a3 - 18c
6776e8d37d2076bd66e7160f2c3cab90ce064dc8cdb1d10954c2c7007fd1e83963da6c78fcfb30783c1a37ff36d7255718324e64f414acfd5cefc1ade05a3 trusted.overlay.metacopy=\x00D\x00\x02\x18\xc6wn\x8d7\xd2\x07k\xd
6nq`\xf2\xc3\xca\xb9\x0c\xe0d\xdc\x8c\xdb\x1d\x10\x95L,p\x07\xfd\x1e\x83\x96\x3d\xa6\xc7\x8f\xcf\xb3\x07\x83\xc1\xa3\x7f\xf3mrUq\x83$\xe6OAJ\xcf\xd5\xce\xfc\x1a\xde\x05\xa3 trusted.overlay.re
direct=/18/c6776e8d37d2076bd66e7160f2c3cab90ce064dc8cdb1d10954c2c7007fd1e83963da6c78fcfb30783c1a37ff36d7255718324e64f414acfd5cefc1ade05a3
/usr/lib/readme.txt 13 100644 1 1000 1000 0 1773135856.0 - test\x20fixture\x0a -

Copy link
Collaborator Author

@Johan-Liebert1 Johan-Liebert1 Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With only backing path

$ ./cfsctl --repo /tmp/.tmpqzyqeM dump-files refs/my-image /usr/bin/hello --backing-path

/usr/bin/hello /18/c6776e8d37d2076bd66e7160f2c3cab90ce064dc8cdb1d10954c2c7007fd1e83963da6c78fcfb30783c1a37ff36d7255718324e64f414acfd5cefc1ade05a3

@cgwalters
Copy link
Collaborator

Also we should unify and fix https://github.com/composefs/composefs-rs/blob/main/crates/composefs-oci/src/oci_image.rs#L1015 to operate in terms of the erofs not the tar I think

@Johan-Liebert1 Johan-Liebert1 force-pushed the cmd-get-objref branch 3 times, most recently from f2d26f1 to 5cd008c Compare March 10, 2026 10:09
@Johan-Liebert1 Johan-Liebert1 force-pushed the cmd-get-objref branch 3 times, most recently from 66902dd to 94eeb96 Compare March 10, 2026 10:35
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code churn in the last commit that seems unrelated to "use sha512" - I think it'd make sense to squash.

let data = extract_file_data(inode, img);
LeafContent::Regular(RegularFile::Inline(data.into()))
}
// TODO: Sha256HashValue not handled here because the trait FsVerityHashValue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to #181 - instead of using generics after that we'll need the repo itself to handle this.

}

/// Extract xattrs from an inode
fn extract_xattrs_from_inode(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... do we not already have code that creates a Tree from erofs? I think it would make sense to do instead of this basically

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one that creates a Tree from a dumpfile, but I'm not sure if there's code that creates a Tree from an erofs image. We do have an erofs debugger, but that prints output in a different format

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...ok I can take a crack at that but let's get #247 in first?

Actually...OK I am sorry about this I really scope creeped your PR. I think the larger issues are valid, but maybe we should just get in the feature you originally requested and then tackle the larger erofs/dumpfile stuff after?

How about this: We making --backing-path a required option for now, and always output the backing paths in dumpfile format?


Actually I am going to take a crack at adding erofs -> Tree code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'm liking #252

This command recevies an EROFS image and file paths, and produces a
dump file like ouput. If `--backing-path-only` is provided, only the backing
object digest is outputted in the format `<file name> <digest>`

If a directory is passed as input instead of a file, it works like the
`ls` command printing information for all files/directories inside

Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>

refactor: Refactor duplicated code in cfsctl

Move out duplicated filesystem creation code blocks to separate
functions

Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
@Johan-Liebert1
Copy link
Collaborator Author

Rebased on top of #252. Makes the code much simpler

@cgwalters
Copy link
Collaborator

This one also relates to #225

@cgwalters cgwalters merged commit 999033a into composefs:main Mar 11, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants