perf: 25% speed gain on json serialization through orjson...#455
perf: 25% speed gain on json serialization through orjson...#455VerdantForge wants to merge 3 commits intodocling-project:mainfrom
Conversation
|
✅ DCO Check Passed Thanks @VerdantForge, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
7594d33 to
9890b37
Compare
In the DoclingDocument classn 25% speed gain on json serialization through orjson. BREAKING CHANGE: orjson replaces indent option by option=orjson.OPT_INDENT_2 and only supports indent 2. BREAKING CHANGE: orjson drops ensure_ascii option and cannot escape UTF-8 to ASCII Signed-off-by: Nicholas Greensmith <123564396+VerdantForge@users.noreply.github.com>
9890b37 to
5b83903
Compare
|
@VerdantForge This PR is a long time in |
my bad didn't realize it was stuck in draft |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesWaiting for:
This rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
- Move `import orjson` from stdlib section to third-party section - Use `option=None` instead of `option=0` when not indenting (more idiomatic) - Pin orjson upper bound to <4.0.0 to match project dependency style - Update constructed_doc.referenced.json.gt to use actual Unicode chars instead of \uXXXX escapes, matching orjson's default output behavior https://claude.ai/code/session_01CQLkDoQnX6UoXi9bHTJipF Signed-off-by: Claude <noreply@anthropic.com>
Brings the branch in sync with upstream docling-project/docling-core:main to resolve merge conflicts in document.py, pyproject.toml, and uv.lock: - Import section: add collections.abc.Iterable, dataclasses.dataclass, pydantic SerializerFunctionWrapHandler/model_serializer, tabulate._column_type; move Annotated from typing_extensions to typing - pyproject.toml: drop Python 3.9, bump tabulate/pandas/typer bounds to match upstream, add defusedxml and pydantic-settings dependencies - Regenerate uv.lock accordingly https://claude.ai/code/session_01CQLkDoQnX6UoXi9bHTJipF Signed-off-by: Claude <noreply@anthropic.com>
eca1ff0 to
fc0ac8b
Compare
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesWaiting for:
This rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
orjson allows for blazing fast serialization/deserialization of json in python. applying this to
save_as_jsonallows us to save an incredible 50ms on some documents (25% serialization time reduction).fixes: #451
BREAKING_CHANGES: