[SharedCache] Use basic copy-on-write for viewStateCache by bdash · Pull Request #6129 · Vector35/binaryninja-api

bdash · 2024-11-14T03:59:17Z

Copying the state from the cache into a new SharedCache object is done with a global lock held and is so expensive that it results in much of the shared cache analysis running on a single thread, with others blocked waiting to acquire the lock.

The cache now holds a std::shared_ptr to the state. New SharedCache objects take a reference to the cached state and only create their own copy of it the first time they perform an operation that would mutate it. The cached copy is never mutated, only replaced, so there is no danger of modifying the state out from under a SharedCache object. Since the copy happens at first mutation, it is performed without any global locks held. This avoids blocking other threads.

This cuts the initial load time of a macOS shared cache from 3 minutes to 70 seconds, and cuts the time taken to load and analyze AppKit from multiple hours to around 14 minutes. The process now consistently uses 8-14 CPU cores rather than being limited to 1 core.

A couple of notes:

This depends on [SharedCache] Rework metadata serialization to reduce memory overhead #6127. Since GitHub doesn't really support dependent changes that is also included on this branch. It can be dropped once [SharedCache] Rework metadata serialization to reduce memory overhead #6127 is merged. You can use https://github.com/bdash/binaryninja-api/compare/dsc-serialization...bdash:binaryninja-api:dsc-cash-cow?expand=1 to view the diff excluding the serialization changes.
This approach to copy-on-write is very naive, but easy to reason about. Copying the entirety of State when performing the first mutation is still quite expensive. There are a number of large vectors and hash maps whose items are not cheap to copy. There's more scope for improving performance by copying only the parts of the state being mutated. I went down that path but it was hard to enforce invariants (ensuring that everything we're mutating has been copied) and so harder to reason about correctness.
Rethinking how this data is used or the data structures involved may be a better long-term solution.

plafosse · 2024-11-14T11:31:31Z

Thank you so much for your work on this!

plafosse · 2024-11-15T16:41:41Z

Just wanted to give you a heads up on our plan for your PRs. We're in the process of releasing, 4.2 and this won't make the cut off. We're going to accept this PR after the release. So it might be a week or 2 before it gets accepted. Again thanks for the great commits!

api/MetadataSerializable.hpp is removed in favor of including core/MetadataSerializable.hpp. Both headers defined types with the same name leading to One Definition Rule violations and surprising behavior. The serialization and deserialization context are now created on-demand during serialization rather than being a member of `MetadataSerializable`. This reduces the size of every serializable object by ~220 bytes. The context is passed explicitly as an argument to `Serialize` / `Deserialize`. As a result, `Serialize` / `Deserialize` can now be free functions rather than member functions. Since `MetadataSerializable` is not used for dynamic dispatch, the virtual methods are removed and the class is updated to be a class template using CRTP. This allows delegating to the derived class's `Load` and `Store` methods without the additional size overhead of the vtable pointer in every serializable object. These changes reduce the memory footprint of Binary Ninja after loading the macOS shared cache and loading a single dylib from it from 8.3GB to 4.6GB.

…e .cpp file

This ensures only one definition ends up in the final binary and makes compilation a little faster.

…o MetadataSerializable

Building up an in-memory representation of the JSON document is expensive in both CPU and memory. Instead of doing that we can directly write the appropriate types.

[immer](https://github.com/arximboldi/immer) provides persistent, immutable data structures such as vectors and maps. These data structures support passing by value without copying any data and structural sharing to copy only a subset of data when a data structure is mutated. immer is published under the Boost Software License which should be compatible with its use in this context. Using these data structures eliminates a lot of the unnecessary copying of the shared cache's state when retrieving it from the view cache and beginning to mutate it. Instead of all of the vectors and maps contained within the state being copied, only the portions of the vectors or maps that are mutated end up being copied. The downside is that the APIs used when mutating are less ergonomic than using the native C++ types. The upside is that this cuts the time taken for the initial load and analysis of a macOS shared cache to around 45 seconds (from 70 seconds with the basic CoW implementation in Vector35#6129) and cuts the time taken to load and analyze AppKit from 14 minutes to around 8.5 minutes.

bdash · 2024-11-17T07:08:10Z

bdash@dsc-persistent-data-structures takes copy-on-write one step further by using persistent data structures from immer for the various vectors and maps that make up the in-memory state related to the shared cache. Their copy-on-write and structural sharing significantly reduces the amount of copying that occurs, cutting the time spent loading from the shared cache by nearly 50% vs this PR. It is a more invasive change that could benefit from some refactoring.

1. Continue to serialize the `cputype` / `cpusubtype` fields of `mach_header_64` as unsigned, despite them being signed. This preserves compatibility with the existing metadata version. 2. Add the `Serialize` declaration for the special `std::pair<uint64_t, std::pair<uint64_t, uint64_t>>` overload to the header. This ensures it will be favored over the generic `std::pair<First, Second>` template function and preserves the serialization used with the existing metadata version.

Copying the state from the cache into a new `SharedCache` object is done with a global lock held and is so expensive that it results in much of the shared cache analysis running on a single thread, with others blocked waiting to acquire the lock. The cache now holds a `std::shared_ptr` to the state. New `SharedCache` objects take a reference to the cached state and only create their own copy of it the first time they perform an operation that would mutate it. The cached copy is never mutated, only replaced, so there is no danger of modifying the state out from under a `SharedCache` object. Since the copy happens at first mutation, it is performed without any global locks held. This avoids blocking other threads. This cuts the initial load time of a macOS shared cache from 3 minutes to 70 seconds, and cuts the time taken to load and analyze AppKit from multiple hours to around 14 minutes.

[immer](https://github.com/arximboldi/immer) provides persistent, immutable data structures such as vectors and maps. These data structures support passing by value without copying any data and structural sharing to copy only a subset of data when a data structure is mutated. immer is published under the Boost Software License which should be compatible with its use in this context. Using these data structures eliminates a lot of the unnecessary copying of the shared cache's state when retrieving it from the view cache and beginning to mutate it. Instead of all of the vectors and maps contained within the state being copied, only the portions of the vectors or maps that are mutated end up being copied. The downside is that the APIs used when mutating are less ergonomic than using the native C++ types. The upside is that this cuts the time taken for the initial load and analysis of a macOS shared cache to around 45 seconds (from 70 seconds with the basic CoW implementation in Vector35#6129) and cuts the time taken to load and analyze AppKit from 14 minutes to around 8.5 minutes.

0cyn · 2024-12-10T16:07:43Z

Merged via 8024cbe3

[immer](https://github.com/arximboldi/immer) provides persistent, immutable data structures such as vectors and maps. These data structures support passing by value without copying any data and structural sharing to copy only a subset of data when a data structure is mutated. immer is published under the Boost Software License which should be compatible with its use in this context. Using these data structures eliminates a lot of the unnecessary copying of the shared cache's state when retrieving it from the view cache and beginning to mutate it. Instead of all of the vectors and maps contained within the state being copied, only the portions of the vectors or maps that are mutated end up being copied. The downside is that the APIs used when mutating are less ergonomic than using the native C++ types. The upside is that this cuts the time taken for the initial load and analysis of a macOS shared cache to around 45 seconds (from 70 seconds with the basic CoW implementation in Vector35#6129) and cuts the time taken to load and analyze AppKit from 14 minutes to around 8.5 minutes.

The initial state is initialized during `PerformInitialLoad` and is immutable after that point. This required some slight restructuring of how information about memory regions is tracked as that was previously modified as regions were loaded. Memory regions are now stored in a map from their address range to the `MemoryRegion` object. This makes it cheap to look them up by address which is a common operation. The modified state consists of changes since the last save to the `DSCView` / `ViewSpecificState`. This means it is no longer necessary to copy any state when mutating a `SharedCache` instance for the first time. Instead, its data structures start off empty and are populated as images, sections, or symbol information is loaded. The loaded state consists of all modified state that has since been saved. It lives on the `ViewSpecificState`. Saving modified state merges it into the the existing loaded state. This pattern is carried over to the `Metadata` stored on the `DSCView`. The initial state is stored under its own metadata key, and each modified state is stored under a key with an incrementing number. This means each save of the state only needs to serialize the state that changed, rather than reserializing all of the state all of the time. There are two huge benefits from these changes: 1. At no point does `SharedCache` have to copy its in memory state. The basic copy-on-write approach introduced in Vector35#6129 reduced how often these copies are made, but they're still frequent and very expensive. 1. At no point does `SharedCache` have to re-serialize state to JSON that it has already serialized. JSON serialization previously added hundreds of milliseconds to any mutating operation on `SharedCache`. As a result, this speeds up the initial load of the shared cache by around 2x and loading of subsequent images improves by about the same. One trade-off is that the serialization / deserialization logic is more complicated. There are two reasons for this: 1. The state is now split across multiple metadata keys and needs to be merged when it is loaded. 2. The in-memory representation uses pointers to identify memory regions. These relationships have to be re-established after the JSON is deserialized. As a future direction it is worth considering whether the logic owned by `SharedCache` could be split in a similar manner to the data. The initial loading of the cache header, loading of images, and handling of symbol information are all mostly independent and work on separate data. If the logic were split into separate classes it would be easier to reason about which data is valid when, and would easily permit concurrent loading of multiple images from the shared library in a thread-safe manner.

bdash mentioned this pull request Nov 14, 2024

DSC analysis appears to be limited to a single thread #6060

Closed

plafosse added this to the Gallifrey milestone Nov 15, 2024

plafosse assigned 0cyn Nov 15, 2024

bdash added 5 commits November 16, 2024 21:56

Move implementation of SharedCache::Load / SharedCache::Store into th…

36a6638

…e .cpp file

[SharedCache] Move Serialize / Deserialize definitions into .cpp file

1422f50

This ensures only one definition ends up in the final binary and makes compilation a little faster.

Move duplicated Serialize / Deserialize functions from api / core int…

9d72820

…o MetadataSerializable

Switch to SAX-style JSON writing API

7c77bfe

Building up an in-memory representation of the JSON document is expensive in both CPU and memory. Instead of doing that we can directly write the appropriate types.

bdash force-pushed the dsc-cash-cow branch from 86e686c to 678d570 Compare November 17, 2024 06:58

bdash added 2 commits November 20, 2024 18:17

bdash force-pushed the dsc-cash-cow branch from 678d570 to 268b5c6 Compare November 21, 2024 02:18

This was referenced Nov 21, 2024

Serialization fixes bdash/binaryninja-api#1

Closed

[SharedCache] All proposed shared cache improvements bdash/binaryninja-api#2

Closed

WeiN76LQh mentioned this pull request Nov 26, 2024

[SharedCache] Use m_exportInfos as an export list cache #6197

Merged

0cyn closed this Dec 10, 2024

bdash deleted the dsc-cash-cow branch December 19, 2024 04:54

bdash mentioned this pull request Feb 5, 2025

[SharedCache] Split state into initial, loaded, and modified #6393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SharedCache] Use basic copy-on-write for viewStateCache#6129

[SharedCache] Use basic copy-on-write for viewStateCache#6129
bdash wants to merge 7 commits into
Vector35:devfrom
bdash:dsc-cash-cow

bdash commented Nov 14, 2024

Uh oh!

plafosse commented Nov 14, 2024

Uh oh!

plafosse commented Nov 15, 2024

Uh oh!

bdash commented Nov 17, 2024

Uh oh!

0cyn commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bdash commented Nov 14, 2024

Uh oh!

plafosse commented Nov 14, 2024

Uh oh!

plafosse commented Nov 15, 2024

Uh oh!

bdash commented Nov 17, 2024

Uh oh!

0cyn commented Dec 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants