Skip to content

Adds fixed-length unicode String encoding#164

Open
bogovicj wants to merge 4 commits intomasterfrom
feat/zarr-unicode
Open

Adds fixed-length unicode String encoding#164
bogovicj wants to merge 4 commits intomasterfrom
feat/zarr-unicode

Conversation

@bogovicj
Copy link
Copy Markdown
Contributor

As specified by Zarr 2

see saalfeldlab/n5-zarr#74

bogovicj added 2 commits July 22, 2025 15:30
* truncate too-long strings during encoding
* do not compute max length during encoding
* rename static methods to get String DataCodecs
* add doc
@bogovicj bogovicj marked this pull request as ready for review October 31, 2025 20:44
@bogovicj
Copy link
Copy Markdown
Contributor Author

@bogovicj (note to self)
https://github.com/zarr-developers/zarr-extensions/blob/main/codecs/vlen-utf8/README.md#format-and-algorithm

In the encoded format, each chunk is prefixed with a 32-bit little-endian unsigned integer (u32le) that specifies the number of elements in the chunk. This prefix is followed by a sequence of encoded elements in lexicographical order. Each element in the sequence is encoded by a u32le representing the number of bytes followed by the bytes themselves. The bytes for each element are obtained by encoding the element as UTF8 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant