Skip to content

[SharedCache] Use basic copy-on-write for viewStateCache#6129

Closed
bdash wants to merge 7 commits into
Vector35:devfrom
bdash:dsc-cash-cow
Closed

[SharedCache] Use basic copy-on-write for viewStateCache#6129
bdash wants to merge 7 commits into
Vector35:devfrom
bdash:dsc-cash-cow

Conversation

@bdash
Copy link
Copy Markdown
Contributor

@bdash bdash commented Nov 14, 2024

Copying the state from the cache into a new SharedCache object is done with a global lock held and is so expensive that it results in much of the shared cache analysis running on a single thread, with others blocked waiting to acquire the lock.

The cache now holds a std::shared_ptr to the state. New SharedCache objects take a reference to the cached state and only create their own copy of it the first time they perform an operation that would mutate it. The cached copy is never mutated, only replaced, so there is no danger of modifying the state out from under a SharedCache object. Since the copy happens at first mutation, it is performed without any global locks held. This avoids blocking other threads.

This cuts the initial load time of a macOS shared cache from 3 minutes to 70 seconds, and cuts the time taken to load and analyze AppKit from multiple hours to around 14 minutes. The process now consistently uses 8-14 CPU cores rather than being limited to 1 core.


A couple of notes:

  1. This depends on [SharedCache] Rework metadata serialization to reduce memory overhead #6127. Since GitHub doesn't really support dependent changes that is also included on this branch. It can be dropped once [SharedCache] Rework metadata serialization to reduce memory overhead #6127 is merged. You can use https://github.com/bdash/binaryninja-api/compare/dsc-serialization...bdash:binaryninja-api:dsc-cash-cow?expand=1 to view the diff excluding the serialization changes.
  2. This approach to copy-on-write is very naive, but easy to reason about. Copying the entirety of State when performing the first mutation is still quite expensive. There are a number of large vectors and hash maps whose items are not cheap to copy. There's more scope for improving performance by copying only the parts of the state being mutated. I went down that path but it was hard to enforce invariants (ensuring that everything we're mutating has been copied) and so harder to reason about correctness.
  3. Rethinking how this data is used or the data structures involved may be a better long-term solution.

@plafosse
Copy link
Copy Markdown
Member

Thank you so much for your work on this!

@plafosse
Copy link
Copy Markdown
Member

Just wanted to give you a heads up on our plan for your PRs. We're in the process of releasing, 4.2 and this won't make the cut off. We're going to accept this PR after the release. So it might be a week or 2 before it gets accepted. Again thanks for the great commits!

@plafosse plafosse added this to the Gallifrey milestone Nov 15, 2024
api/MetadataSerializable.hpp is removed in favor of including
core/MetadataSerializable.hpp. Both headers defined types with the same
name leading to One Definition Rule violations and surprising behavior.

The serialization and deserialization context are now created on-demand
during serialization rather than being a member of
`MetadataSerializable`. This reduces the size of every serializable
object by ~220 bytes.

The context is passed explicitly as an argument to `Serialize` /
`Deserialize`. As a result, `Serialize` / `Deserialize` can now be free
functions rather than member functions.

Since `MetadataSerializable` is not used for dynamic dispatch,
the virtual methods are removed and the class is updated to be a class
template using CRTP. This allows delegating to the derived class's
`Load` and `Store` methods without the additional size overhead of the
vtable pointer in every serializable object.

These changes reduce the memory footprint of Binary Ninja after loading
the macOS shared cache and loading a single dylib from it from 8.3GB to
4.6GB.
This ensures only one definition ends up in the final binary and makes compilation a little faster.
Building up an in-memory representation of the JSON document is expensive in both CPU and memory. Instead of doing that we can directly write the appropriate types.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Nov 17, 2024
[immer](https://github.com/arximboldi/immer) provides persistent,
immutable data structures such as vectors and maps. These data
structures support passing by value without copying any data and
structural sharing to copy only a subset of data when a data structure
is mutated. immer is published under the Boost Software License which
should be compatible with its use in this context.

Using these data structures eliminates a lot of the unnecessary copying
of the shared cache's state when retrieving it from the view cache and
beginning to mutate it. Instead of all of the vectors and maps contained
within the state being copied, only the portions of the vectors or maps
that are mutated end up being copied.

The downside is that the APIs used when mutating are less ergonomic than
using the native C++ types.

The upside is that this cuts the time taken for the initial load and
analysis of a macOS shared cache to around 45 seconds (from 70 seconds
with the basic CoW implementation in Vector35#6129) and cuts the time taken to
load and analyze AppKit from 14 minutes to around 8.5 minutes.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Nov 17, 2024
[immer](https://github.com/arximboldi/immer) provides persistent,
immutable data structures such as vectors and maps. These data
structures support passing by value without copying any data and
structural sharing to copy only a subset of data when a data structure
is mutated. immer is published under the Boost Software License which
should be compatible with its use in this context.

Using these data structures eliminates a lot of the unnecessary copying
of the shared cache's state when retrieving it from the view cache and
beginning to mutate it. Instead of all of the vectors and maps contained
within the state being copied, only the portions of the vectors or maps
that are mutated end up being copied.

The downside is that the APIs used when mutating are less ergonomic than
using the native C++ types.

The upside is that this cuts the time taken for the initial load and
analysis of a macOS shared cache to around 45 seconds (from 70 seconds
with the basic CoW implementation in Vector35#6129) and cuts the time taken to
load and analyze AppKit from 14 minutes to around 8.5 minutes.
@bdash
Copy link
Copy Markdown
Contributor Author

bdash commented Nov 17, 2024

bdash@dsc-persistent-data-structures takes copy-on-write one step further by using persistent data structures from immer for the various vectors and maps that make up the in-memory state related to the shared cache. Their copy-on-write and structural sharing significantly reduces the amount of copying that occurs, cutting the time spent loading from the shared cache by nearly 50% vs this PR. It is a more invasive change that could benefit from some refactoring.

1. Continue to serialize the `cputype` / `cpusubtype` fields of
   `mach_header_64` as unsigned, despite them being signed. This
   preserves compatibility with the existing metadata version.
2. Add the `Serialize` declaration for the special `std::pair<uint64_t,
   std::pair<uint64_t, uint64_t>>` overload to the header. This ensures
   it will be favored over the generic `std::pair<First, Second>`
   template function and preserves the serialization used with the
   existing metadata version.
Copying the state from the cache into a new `SharedCache` object is done
with a global lock held and is so expensive that it results in much of
the shared cache analysis running on a single thread, with others
blocked waiting to acquire the lock.

The cache now holds a `std::shared_ptr` to the state. New `SharedCache`
objects take a reference to the cached state and only create their own
copy of it the first time they perform an operation that would mutate
it. The cached copy is never mutated, only replaced, so there is no
danger of modifying the state out from under a `SharedCache` object.
Since the copy happens at first mutation, it is performed without any
global locks held. This avoids blocking other threads.

This cuts the initial load time of a macOS shared cache from 3 minutes
to 70 seconds, and cuts the time taken to load and analyze AppKit from
multiple hours to around 14 minutes.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Nov 21, 2024
[immer](https://github.com/arximboldi/immer) provides persistent,
immutable data structures such as vectors and maps. These data
structures support passing by value without copying any data and
structural sharing to copy only a subset of data when a data structure
is mutated. immer is published under the Boost Software License which
should be compatible with its use in this context.

Using these data structures eliminates a lot of the unnecessary copying
of the shared cache's state when retrieving it from the view cache and
beginning to mutate it. Instead of all of the vectors and maps contained
within the state being copied, only the portions of the vectors or maps
that are mutated end up being copied.

The downside is that the APIs used when mutating are less ergonomic than
using the native C++ types.

The upside is that this cuts the time taken for the initial load and
analysis of a macOS shared cache to around 45 seconds (from 70 seconds
with the basic CoW implementation in Vector35#6129) and cuts the time taken to
load and analyze AppKit from 14 minutes to around 8.5 minutes.
@0cyn
Copy link
Copy Markdown
Contributor

0cyn commented Dec 10, 2024

Merged via 8024cbe3

@0cyn 0cyn closed this Dec 10, 2024
bdash added a commit to bdash/binaryninja-api that referenced this pull request Dec 19, 2024
[immer](https://github.com/arximboldi/immer) provides persistent,
immutable data structures such as vectors and maps. These data
structures support passing by value without copying any data and
structural sharing to copy only a subset of data when a data structure
is mutated. immer is published under the Boost Software License which
should be compatible with its use in this context.

Using these data structures eliminates a lot of the unnecessary copying
of the shared cache's state when retrieving it from the view cache and
beginning to mutate it. Instead of all of the vectors and maps contained
within the state being copied, only the portions of the vectors or maps
that are mutated end up being copied.

The downside is that the APIs used when mutating are less ergonomic than
using the native C++ types.

The upside is that this cuts the time taken for the initial load and
analysis of a macOS shared cache to around 45 seconds (from 70 seconds
with the basic CoW implementation in Vector35#6129) and cuts the time taken to
load and analyze AppKit from 14 minutes to around 8.5 minutes.
@bdash bdash deleted the dsc-cash-cow branch December 19, 2024 04:54
bdash added a commit to bdash/binaryninja-api that referenced this pull request Jan 9, 2025
[immer](https://github.com/arximboldi/immer) provides persistent,
immutable data structures such as vectors and maps. These data
structures support passing by value without copying any data and
structural sharing to copy only a subset of data when a data structure
is mutated. immer is published under the Boost Software License which
should be compatible with its use in this context.

Using these data structures eliminates a lot of the unnecessary copying
of the shared cache's state when retrieving it from the view cache and
beginning to mutate it. Instead of all of the vectors and maps contained
within the state being copied, only the portions of the vectors or maps
that are mutated end up being copied.

The downside is that the APIs used when mutating are less ergonomic than
using the native C++ types.

The upside is that this cuts the time taken for the initial load and
analysis of a macOS shared cache to around 45 seconds (from 70 seconds
with the basic CoW implementation in Vector35#6129) and cuts the time taken to
load and analyze AppKit from 14 minutes to around 8.5 minutes.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Jan 14, 2025
[immer](https://github.com/arximboldi/immer) provides persistent,
immutable data structures such as vectors and maps. These data
structures support passing by value without copying any data and
structural sharing to copy only a subset of data when a data structure
is mutated. immer is published under the Boost Software License which
should be compatible with its use in this context.

Using these data structures eliminates a lot of the unnecessary copying
of the shared cache's state when retrieving it from the view cache and
beginning to mutate it. Instead of all of the vectors and maps contained
within the state being copied, only the portions of the vectors or maps
that are mutated end up being copied.

The downside is that the APIs used when mutating are less ergonomic than
using the native C++ types.

The upside is that this cuts the time taken for the initial load and
analysis of a macOS shared cache to around 45 seconds (from 70 seconds
with the basic CoW implementation in Vector35#6129) and cuts the time taken to
load and analyze AppKit from 14 minutes to around 8.5 minutes.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Feb 5, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Feb 8, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Feb 12, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Feb 13, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Feb 13, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
bdash added a commit to bdash/binaryninja-api that referenced this pull request Feb 13, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
CouleeApps pushed a commit to bdash/binaryninja-api that referenced this pull request Feb 17, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
rbran pushed a commit to rbran/binaryninja-api that referenced this pull request May 22, 2025
The initial state is initialized during `PerformInitialLoad` and is
immutable after that point. This required some slight restructuring of
how information about memory regions is tracked as that was previously
modified as regions were loaded. Memory regions are now stored in a map
from their address range to the `MemoryRegion` object. This makes it
cheap to look them up by address which is a common operation.

The modified state consists of changes since the last save to the
`DSCView` / `ViewSpecificState`. This means it is no longer necessary to
copy any state when mutating a `SharedCache` instance for the first
time. Instead, its data structures start off empty and are populated as
images, sections, or symbol information is loaded.

The loaded state consists of all modified state that has since been
saved. It lives on the `ViewSpecificState`. Saving modified state
merges it into the the existing loaded state.

This pattern is carried over to the `Metadata` stored on the `DSCView`.
The initial state is stored under its own metadata key, and each
modified state is stored under a key with an incrementing number. This
means each save of the state only needs to serialize the state that
changed, rather than reserializing all of the state all of the time.

There are two huge benefits from these changes:
1. At no point does `SharedCache` have to copy its in memory state.
   The basic copy-on-write approach introduced in Vector35#6129 reduced how
   often these copies are made, but they're still frequent and very
   expensive.
1. At no point does `SharedCache` have to re-serialize state to JSON
   that it has already serialized. JSON serialization previously added
   hundreds of milliseconds to any mutating operation on `SharedCache`.

As a result, this speeds up the initial load of the shared cache by
around 2x and loading of subsequent images improves by about the same.

One trade-off is that the serialization / deserialization logic is more
complicated. There are two reasons for this:
1. The state is now split across multiple metadata keys and needs to be
   merged when it is loaded.
2. The in-memory representation uses pointers to identify memory regions.
   These relationships have to be re-established after the JSON is
   deserialized.

As a future direction it is worth considering whether the logic owned by
`SharedCache` could be split in a similar manner to the data. The
initial loading of the cache header, loading of images, and handling of
symbol information are all mostly independent and work on separate data.
If the logic were split into separate classes it would be easier to
reason about which data is valid when, and would easily permit
concurrent loading of multiple images from the shared library in a
thread-safe manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants