[SharedCache] Switch to SAX-based writer API for JSON serialization#6139
Closed
bdash wants to merge 6 commits into
Closed
[SharedCache] Switch to SAX-based writer API for JSON serialization#6139bdash wants to merge 6 commits into
bdash wants to merge 6 commits into
Conversation
api/MetadataSerializable.hpp is removed in favor of including core/MetadataSerializable.hpp. Both headers defined types with the same name leading to One Definition Rule violations and surprising behavior. The serialization and deserialization context are now created on-demand during serialization rather than being a member of `MetadataSerializable`. This reduces the size of every serializable object by ~220 bytes. The context is passed explicitly as an argument to `Serialize` / `Deserialize`. As a result, `Serialize` / `Deserialize` can now be free functions rather than member functions. Since `MetadataSerializable` is not used for dynamic dispatch, the virtual methods are removed and the class is updated to be a class template using CRTP. This allows delegating to the derived class's `Load` and `Store` methods without the additional size overhead of the vtable pointer in every serializable object. These changes reduce the memory footprint of Binary Ninja after loading the macOS shared cache and loading a single dylib from it from 8.3GB to 4.6GB.
This ensures only one definition ends up in the final binary and makes compilation a little faster.
…o MetadataSerializable
Building up an in-memory representation of the JSON document is expensive in both CPU and memory. Instead of doing that we can directly write the appropriate types.
b7a5518 to
7c77bfe
Compare
1. Continue to serialize the `cputype` / `cpusubtype` fields of `mach_header_64` as unsigned, despite them being signed. This preserves compatibility with the existing metadata version. 2. Add the `Serialize` declaration for the special `std::pair<uint64_t, std::pair<uint64_t, uint64_t>>` overload to the header. This ensures it will be favored over the generic `std::pair<First, Second>` template function and preserves the serialization used with the existing metadata version.
Contributor
|
Merged via a2e5d061 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Building up an in-memory representation of the JSON document was performing a lot of temporary memory allocations. Using the SAX-based writer API avoids this work by directly writing the desired types. This cuts the time spent serializing the state to JSON by around half.
One additional benefit of the SAX-based writer API is that it's now possible for serialization to operate on individual types rather than having to serialize a complex type in a single operation in order to assign it to an object field.
Serializeis updated to work on a single value at a time, with function templates for types likestd::pair,std::vector, andstd::unordered_mapdelegating toSerializeoverloads for the types they contain. This removes the repetition that was previously required for implementing serialization arrays or maps of different types.Deserialization continues to use the document-based API as using the SAX-based reader API is cumbersome.
This branch builds on the work in #6127 and should be compared against it. You can use https://github.com/bdash/binaryninja-api/compare/dsc-serialization...bdash:binaryninja-api:dsc-serialization-2?expand=1 to view the diff excluding the serialization changes.