Description
If the image layer includes non-unicode file path, stargzify can't process correctly, example:
mkdir ./test
touch ./test/$(printf '\xFF')
touch ./test/$(printf '\xAA')
tar -czvf ./test.tar.gz ./test
stargzify file:test.tar.gz file:test.stargz
tar -xzvf test.stargz stargz.index.json
The index file stargz.index.json has two entries with the same name "�":
{
"version": 1,
"entries": [
{
"name": "./test/",
"type": "dir",
"modtime": "2020-09-08T10:33:57Z",
"mode": 509,
"uid": 1000,
"gid": 1000,
"NumLink": 0
},
{
"name": "./test/\ufffd",
"type": "reg",
"modtime": "2020-09-08T10:33:47Z",
"mode": 436,
"uid": 1000,
"gid": 1000,
"NumLink": 0,
"digest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
},
{
"name": "./test/\ufffd",
"type": "reg",
"modtime": "2020-09-08T10:33:57Z",
"mode": 436,
"uid": 1000,
"gid": 1000,
"NumLink": 0,
"digest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
]
}
Possible Solution
Treat file path as []byte like xattr value, so that we can serialize the path as base64 encoded string to JSON index file, but we need to distinguish between plain path and base64 encoded path using extra entry field like "name_encoded": true.
Description
If the image layer includes non-unicode file path, stargzify can't process correctly, example:
The index file
stargz.index.jsonhas two entries with the same name "�":{ "version": 1, "entries": [ { "name": "./test/", "type": "dir", "modtime": "2020-09-08T10:33:57Z", "mode": 509, "uid": 1000, "gid": 1000, "NumLink": 0 }, { "name": "./test/\ufffd", "type": "reg", "modtime": "2020-09-08T10:33:47Z", "mode": 436, "uid": 1000, "gid": 1000, "NumLink": 0, "digest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" }, { "name": "./test/\ufffd", "type": "reg", "modtime": "2020-09-08T10:33:57Z", "mode": 436, "uid": 1000, "gid": 1000, "NumLink": 0, "digest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" } ] }Possible Solution
Treat file path as
[]bytelike xattr value, so that we can serialize the path as base64 encoded string to JSON index file, but we need to distinguish between plain path and base64 encoded path using extra entry field like"name_encoded": true.