Skip to content

[SharedCache] Switch to SAX-based writer API for JSON serialization#6139

Closed
bdash wants to merge 6 commits into
Vector35:devfrom
bdash:dsc-serialization-2
Closed

[SharedCache] Switch to SAX-based writer API for JSON serialization#6139
bdash wants to merge 6 commits into
Vector35:devfrom
bdash:dsc-serialization-2

Conversation

@bdash
Copy link
Copy Markdown
Contributor

@bdash bdash commented Nov 15, 2024

Building up an in-memory representation of the JSON document was performing a lot of temporary memory allocations. Using the SAX-based writer API avoids this work by directly writing the desired types. This cuts the time spent serializing the state to JSON by around half.

One additional benefit of the SAX-based writer API is that it's now possible for serialization to operate on individual types rather than having to serialize a complex type in a single operation in order to assign it to an object field. Serialize is updated to work on a single value at a time, with function templates for types like std::pair, std::vector, and std::unordered_map delegating to Serialize overloads for the types they contain. This removes the repetition that was previously required for implementing serialization arrays or maps of different types.

Deserialization continues to use the document-based API as using the SAX-based reader API is cumbersome.

This branch builds on the work in #6127 and should be compared against it. You can use https://github.com/bdash/binaryninja-api/compare/dsc-serialization...bdash:binaryninja-api:dsc-serialization-2?expand=1 to view the diff excluding the serialization changes.

api/MetadataSerializable.hpp is removed in favor of including
core/MetadataSerializable.hpp. Both headers defined types with the same
name leading to One Definition Rule violations and surprising behavior.

The serialization and deserialization context are now created on-demand
during serialization rather than being a member of
`MetadataSerializable`. This reduces the size of every serializable
object by ~220 bytes.

The context is passed explicitly as an argument to `Serialize` /
`Deserialize`. As a result, `Serialize` / `Deserialize` can now be free
functions rather than member functions.

Since `MetadataSerializable` is not used for dynamic dispatch,
the virtual methods are removed and the class is updated to be a class
template using CRTP. This allows delegating to the derived class's
`Load` and `Store` methods without the additional size overhead of the
vtable pointer in every serializable object.

These changes reduce the memory footprint of Binary Ninja after loading
the macOS shared cache and loading a single dylib from it from 8.3GB to
4.6GB.
This ensures only one definition ends up in the final binary and makes compilation a little faster.
Building up an in-memory representation of the JSON document is expensive in both CPU and memory. Instead of doing that we can directly write the appropriate types.
1. Continue to serialize the `cputype` / `cpusubtype` fields of
   `mach_header_64` as unsigned, despite them being signed. This
   preserves compatibility with the existing metadata version.
2. Add the `Serialize` declaration for the special `std::pair<uint64_t,
   std::pair<uint64_t, uint64_t>>` overload to the header. This ensures
   it will be favored over the generic `std::pair<First, Second>`
   template function and preserves the serialization used with the
   existing metadata version.
@0cyn
Copy link
Copy Markdown
Contributor

0cyn commented Dec 10, 2024

Merged via a2e5d061

@0cyn 0cyn closed this Dec 10, 2024
@bdash bdash deleted the dsc-serialization-2 branch December 19, 2024 04:53
@emesare emesare added this to the Gallifrey milestone Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants