Bug 2013441 - Use orjson to serialize/deserialize objects for the redis cache#3659
Bug 2013441 - Use orjson to serialize/deserialize objects for the redis cache#3659Eijebong merged 1 commit intomozilla-releng:mainfrom
Conversation
bhearsum
left a comment
There was a problem hiding this comment.
I think you might have pushed the wrong commit here? There's nothing cache related 😬
|
Yeah, somehow pushed to the wrong branch... Pushed back the right commit, I'll rebase and fix the lint issue tomorrow |
bhearsum
left a comment
There was a problem hiding this comment.
In the future, separating out the refactor / interface change from switching to orjson would make something like this easier to review. (Thankfully this is small enough that it's not a big deal...)
|
|
||
| def fullkey(self, key): | ||
| return f"{self._name}-{key}" | ||
| return f"v2-{self._name}-{key}" |
There was a problem hiding this comment.
Smart moving bumping this!
| """ | ||
|
|
||
| def __init__(self, redis, name, timeout): | ||
| def __init__(self, redis, name, timeout, redis_loads=None): |
There was a problem hiding this comment.
I think it would be preferable to have this be something like data_loader, and it default to orjson.loads. That would avoid the need to be passing redis_loads at all for the default case, and it would make the type of this always a Callable which seems a bit cleaner?
| # to allow the caller to provide it in a closure, hence we end up | ||
| # making this a callable instead of a simple class. | ||
| self._factory = lambda _, maxsize, timeout: ExpiringLRUCache(maxsize, timeout) | ||
| self._factory = lambda _, maxsize, timeout, redis_loads=None: ExpiringLRUCache(maxsize, timeout) |
There was a problem hiding this comment.
If I'm reading correctly, this could be an unnamed argument? (It's a bit confusing for it to have a name, and be unused, which is why I mention it.)
There was a problem hiding this comment.
Can't have 2 unnamed arguments (without the second one being __). I'll rename it to _redis_load to convey that it's not used instead
| cache.make_cache("blob", 500, 3600) | ||
|
|
||
| def _load_blob(s): | ||
| data = orjson.loads(s) |
There was a problem hiding this comment.
Seeing as what we need to do is post-process, I wonder if it's better to add a post-load hook rather than override loading altogether? (That might make the cache.py changes much cleaner too?)
…is cache The main thing here is that Blobs cannot be deserialized from JSON directly so we have to provide a specific loader to the redis cache, which is why we weren't using orjson in the first place. The commit is mostly plumbing to pass the post_load arg through all of the right cache layers/factories. I originally thought about making this less invasive and just returning the value as is from the cache and let the caller decide what to do with what it got back from the cache (for blobs it could either get a dict or a Blob object based on whether the cache was hit or not, which could easily be detected by always calling createBlob on the result and returning early if it was already a Blob). I went back on this because it felt more fragile and error prone than doing the conversion explicitly. It's still no perfect still those cache definitions aren't tested properly and short of deactivating the LRU cache they're hard to test locally. I tested this by disabling the LRU cache completely and using an `assert cache_value == value_getter()` in the cache hit path then hitting `/update/3/Firefox/42.0/0/Linux_x86_64-gcc3/en-US/release/None/default/default/update.xml` on the public API. Everything worked fine. All the other public caches use trivial objects so we don't need special loaders for them. The admin app doesn't have redis configured. I removed test_release_blobs_can_log_after_pickling since it's irrelevant now but decided against reverting e8cb2ec entirely because getting a logger once per class instead of once per object still sounds like an improvement we'd want and the other test to deepcopy is also still relevant for caches using `make_copies`.
The main thing here is that Blobs cannot be deserialized from JSON directly so we have to provide a specific loader to the redis cache, which is why we weren't using orjson in the first place. The commit is mostly plumbing to pass the redis_loads method through all of the right cache layers/factories.
I originally thought about making this less invasive and just returning the value as is from the cache and let the caller decide what to do with what it got back from the cache (for blobs it could either get a dict or a Blob object based on whether the cache was hit or not, which could easily be detected by always calling createBlob on the result and returning early if it was already a Blob). I went back on this because it felt more fragile and error prone than doing the conversion explicitly. It's still no perfect still those cache definitions aren't tested properly and short of deactivating the LRU cache they're hard to test locally.
I tested this by disabling the LRU cache completely and using an
assert cache_value == value_getter()in the cache hit path then hitting/update/3/Firefox/42.0/0/Linux_x86_64-gcc3/en-US/release/None/default/default/update.xmlon the public API. Everything worked fine.All the other public caches use trivial objects so we don't need special loaders for them. The admin app doesn't have redis configured.
I removed test_release_blobs_can_log_after_pickling since it's irrelevant now but decided against reverting
e8cb2ec entirely because getting a logger once per class instead of once per object still sounds like an improvement we'd want and the other test to deepcopy is also still relevant for caches using
make_copies.