Skip to content

Bug 2013441 - Use orjson to serialize/deserialize objects for the redis cache#3659

Merged
Eijebong merged 1 commit intomozilla-releng:mainfrom
Eijebong:orjson
Mar 3, 2026
Merged

Bug 2013441 - Use orjson to serialize/deserialize objects for the redis cache#3659
Eijebong merged 1 commit intomozilla-releng:mainfrom
Eijebong:orjson

Conversation

@Eijebong
Copy link
Copy Markdown
Contributor

@Eijebong Eijebong commented Feb 5, 2026

The main thing here is that Blobs cannot be deserialized from JSON directly so we have to provide a specific loader to the redis cache, which is why we weren't using orjson in the first place. The commit is mostly plumbing to pass the redis_loads method through all of the right cache layers/factories.

I originally thought about making this less invasive and just returning the value as is from the cache and let the caller decide what to do with what it got back from the cache (for blobs it could either get a dict or a Blob object based on whether the cache was hit or not, which could easily be detected by always calling createBlob on the result and returning early if it was already a Blob). I went back on this because it felt more fragile and error prone than doing the conversion explicitly. It's still no perfect still those cache definitions aren't tested properly and short of deactivating the LRU cache they're hard to test locally.

I tested this by disabling the LRU cache completely and using an assert cache_value == value_getter() in the cache hit path then hitting /update/3/Firefox/42.0/0/Linux_x86_64-gcc3/en-US/release/None/default/default/update.xml on the public API. Everything worked fine.

All the other public caches use trivial objects so we don't need special loaders for them. The admin app doesn't have redis configured.

I removed test_release_blobs_can_log_after_pickling since it's irrelevant now but decided against reverting
e8cb2ec entirely because getting a logger once per class instead of once per object still sounds like an improvement we'd want and the other test to deepcopy is also still relevant for caches using make_copies.

@Eijebong Eijebong requested a review from a team as a code owner February 5, 2026 13:34
@Eijebong Eijebong changed the title Bug 2013441 - Use orjson to serialize/deserialize objects for the red… Bug 2013441 - Use orjson to serialize/deserialize objects for the redis cache Feb 5, 2026
Copy link
Copy Markdown
Contributor

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have pushed the wrong commit here? There's nothing cache related 😬

@Eijebong
Copy link
Copy Markdown
Contributor Author

Eijebong commented Feb 5, 2026

Yeah, somehow pushed to the wrong branch... Pushed back the right commit, I'll rebase and fix the lint issue tomorrow

Copy link
Copy Markdown
Contributor

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, separating out the refactor / interface change from switching to orjson would make something like this easier to review. (Thankfully this is small enough that it's not a big deal...)

Comment thread src/auslib/util/cache.py

def fullkey(self, key):
return f"{self._name}-{key}"
return f"v2-{self._name}-{key}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart moving bumping this!

Comment thread src/auslib/util/cache.py Outdated
"""

def __init__(self, redis, name, timeout):
def __init__(self, redis, name, timeout, redis_loads=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to have this be something like data_loader, and it default to orjson.loads. That would avoid the need to be passing redis_loads at all for the default case, and it would make the type of this always a Callable which seems a bit cleaner?

Comment thread src/auslib/util/cache.py Outdated
# to allow the caller to provide it in a closure, hence we end up
# making this a callable instead of a simple class.
self._factory = lambda _, maxsize, timeout: ExpiringLRUCache(maxsize, timeout)
self._factory = lambda _, maxsize, timeout, redis_loads=None: ExpiringLRUCache(maxsize, timeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading correctly, this could be an unnamed argument? (It's a bit confusing for it to have a name, and be unused, which is why I mention it.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't have 2 unnamed arguments (without the second one being __). I'll rename it to _redis_load to convey that it's not used instead

Comment thread uwsgi/public.wsgi Outdated
cache.make_cache("blob", 500, 3600)

def _load_blob(s):
data = orjson.loads(s)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as what we need to do is post-process, I wonder if it's better to add a post-load hook rather than override loading altogether? (That might make the cache.py changes much cleaner too?)

…is cache

The main thing here is that Blobs cannot be deserialized from JSON
directly so we have to provide a specific loader to the redis cache,
which is why we weren't using orjson in the first place. The commit is
mostly plumbing to pass the post_load arg through all of the right
cache layers/factories.

I originally thought about making this less invasive and just returning
the value as is from the cache and let the caller decide what to do with
what it got back from the cache (for blobs it could either get a dict or
a Blob object based on whether the cache was hit or not, which could
easily be detected by always calling createBlob on the result and
returning early if it was already a Blob). I went back on this because
it felt more fragile and error prone than doing the conversion
explicitly. It's still no perfect still those cache definitions aren't
tested properly and short of deactivating the LRU cache they're hard to
test locally.

I tested this by disabling the LRU cache completely and using an `assert
cache_value == value_getter()` in the cache hit path then hitting
`/update/3/Firefox/42.0/0/Linux_x86_64-gcc3/en-US/release/None/default/default/update.xml`
on the public API. Everything worked fine.

All the other public caches use trivial objects so we don't need special
loaders for them. The admin app doesn't have redis configured.

I removed test_release_blobs_can_log_after_pickling since it's
irrelevant now but decided against reverting
e8cb2ec entirely because getting a
logger once per class instead of once per object still sounds like an
improvement we'd want and the other test to deepcopy is also still
relevant for caches using `make_copies`.
@Eijebong Eijebong merged commit 9a45069 into mozilla-releng:main Mar 3, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants