Split large pending writes queue entries#3970
Split large pending writes queue entries#3970ohadzeliger wants to merge 29 commits intoFoundationDB:mainfrom
Conversation
# Conflicts: # fdb-record-layer-lucene/src/main/java/com/apple/foundationdb/record/lucene/directory/FDBDirectory.java # fdb-record-layer-lucene/src/main/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueue.java # fdb-record-layer-lucene/src/test/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueueTest.java
Adds a useVersionInKey boolean to SplitHelperTestConfig, a keyHelper(int localVersion) factory method, and extends allValidConfigs() to produce versioning variants of all existing configs (excluding isDryRun=true, which is incompatible with versionstamps). This lays the groundwork for multi-transaction test methods that exercise VersioningSplitKeyValueHelper alongside the existing DefaultSplitKeyValueHelper tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix a bug in SplitHelper.writeSplitRecord where split keys were packed using subspace.subspace(key) + Tuple.from(index), producing an inconsistent layout compared to the unsplit path for VersioningSplitKeyValueHelper. Now both call packSplitKey(subspace, key.add(index)), which VersioningSplitKeyValueHelper handles as [subspace][vs][key][index] — consistent with [subspace][vs][key][0] for unsplit records. Add SplitHelperTestConfig.useVersionInKey and keyHelper(localVersion) to drive tests through VersioningSplitKeyValueHelper. Extend allValidConfigs() to cover both helpers. Add Assumptions.assumeFalse guards where tests are incompatible with versioning keys. Introduce saveOnly + verifySuccessfullySaved helpers split from saveSuccessfully, and new saveWithSplitMultipleTransactions test that writes in tx1 and verifies in tx2, covering both default and versioning key helpers. Add parameterized defaultHelperSplitKeyEquivalence test asserting that the old call site subspace.subspace(key).pack(index) equals the new subspace.pack(key.add(index)) for DefaultSplitKeyValueHelper across all relevant key/index combinations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eTransactions - Add useVersionInKey and localVersion parameters to writeDummyRecord, threading both through to writeDummyKV - writeDummyKV now handles versionstamp prepending and FDB write dispatch (SET_VERSIONSTAMPED_KEY vs plain set) based on useVersionInKey - Remove writeDummyVersionedRecord (now redundant with writeDummyRecord) - Add deleteWithSplitMultipleTransactions: write in tx1, delete in tx2, verify empty in tx3, covering both default and versioning key helpers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eTransactions - Add writeDummyRecordsMultipleTransactions(boolean useVersionInKey): writes 50 records in tx1 (with FDBRecordVersion when useVersionInKey=false, with versionstamp keys when true), then scans back in tx2 to build expected FDBRawRecord list with accurate committed sizes - Add scanMultipleRecordsWithVersioningKey: parallel to scanMultipleRecords, uses writeDummyRecordsMultipleTransactions(true) for setup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parallel to scanContinuations, uses writeDummyRecordsMultipleTransactions(true) for setup so records are written with versionstamp keys and read back in a separate transaction with accurate committed sizes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… key, added comments
…est to original, some tests converted.
ScottDugas
left a comment
There was a problem hiding this comment.
There are a fair number of comments about how the clarity of the test could be improved. For anything other than quick one-line changes, it probably makes sense to merge and followup with improved readability of the test, so that we can be confident we're testing the same thing as before.
...ain/java/com/apple/foundationdb/record/provider/foundationdb/DefaultSplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
| if (splitLongRecords || previousSizeInfo == null || previousSizeInfo.isVersionedInline()) { | ||
| if (splitKeyHelper.shouldClearBeforeWrite() && (splitLongRecords || previousSizeInfo == null || previousSizeInfo.isVersionedInline())) { | ||
| // Note that the clearPreviousSplitRecords also removes version splits from the context cache | ||
| // This is not currently supported for the case where we have versions in the keys since we can't trace the old values down |
There was a problem hiding this comment.
If we can't know the previous value, how can we have a previousSizeInfo?
There was a problem hiding this comment.
When shouldClearBeforeWrite()==true the assumption is that we have values we can use to clear the previous keys (with or without the sizeInfo).
when shouldClearBeforeWrite()==false then sizeInfo is ignored as we don't call clear at all (and chances are we will see null there).
There was a problem hiding this comment.
With the changed logic, if the key has an incomplete version, then we can't clear older records based on size info since the size info either has complete version (previous transaction was committed) or it has incomplete versions (in which case the keys are not in the RYW cache)
...ain/java/com/apple/foundationdb/record/provider/foundationdb/DefaultSplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/com/apple/foundationdb/record/provider/foundationdb/SplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/com/apple/foundationdb/record/provider/foundationdb/SplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
| * - | ||
| */ | ||
| @Tag(Tags.RequiresFDB) | ||
| public class SplitHelperMultipleTransactionsTest extends FDBRecordStoreTestBase { |
There was a problem hiding this comment.
Probably not as part of this PR, but it might be worth some sort of chaos testing. (i.e. random key shapes, random bytes for values of random lengths, etc.)
...r-lucene/src/main/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueue.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/com/apple/foundationdb/record/lucene/directory/PendingWritesQueueHelper.java
Outdated
Show resolved
Hide resolved
...cene/src/test/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueueTest.java
Outdated
Show resolved
Hide resolved
...cene/src/test/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueueTest.java
Outdated
Show resolved
Hide resolved
| * @param serialized serialized representation | ||
| * @param version the version to store inline with this record | ||
| */ | ||
| public static void saveWithSplit(@Nonnull final FDBRecordContext context, @Nonnull final Subspace subspace, |
There was a problem hiding this comment.
One to look out for is reading/deleting an item within the same transaction as the one that inserted it. That may mean fishing through the versionMutationCache on the FDBRecordContext for keys with versions that haven't been updated if that hasn't already been done.
| if (splitLongRecords || previousSizeInfo == null || previousSizeInfo.isVersionedInline()) { | ||
| if (splitKeyHelper.shouldClearBeforeWrite() && (splitLongRecords || previousSizeInfo == null || previousSizeInfo.isVersionedInline())) { | ||
| // Note that the clearPreviousSplitRecords also removes version splits from the context cache | ||
| // This is not currently supported for the case where we have versions in the keys since we can't trace the old values down |
There was a problem hiding this comment.
Maybe I don't understand this, but it seems like we should be able to do this. The versionMutationCache always uses a fixed value (all \xff bytes) for the uncommitted version, so two keys with the same FDBRecordVersion in it should appear in the same location of the versionMutationCache.
As it happens, I think this is fine for the use case of the pending writes queue, as a different local version is claimed with each insert. Theoretically, you could imagine leveraging this code to support versionstamps in primary keys (which we currently don't, but this gets us pretty close to being able to support), in which case this may be more important
There was a problem hiding this comment.
The issue faced was that the cache contains specific keys with split suffixes which can span multiple cache entries. In the "incomplete version in key" case we would need to replace them all when a new record gets written with the same local version. One way to introduce that would be to group all the splits by the local version and replace the entire group every time, but that seems a little excessive since there is no use case for this currently.
There was a problem hiding this comment.
I think we do group all of the keys by their local version. The mutations currently go into a NavigableMap, which is sorted by unsigned byte order. Incomplete versions all get assigned 10 \xff bytes as their global version, and so they'll all be there in the cache in consecutive keys. If there's no use case for updating large values within the same transaction as they're included, then we could punt this, though
...ayer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/SplitHelper.java
Outdated
Show resolved
Hide resolved
...ayer-core/src/main/java/com/apple/foundationdb/record/provider/foundationdb/SplitHelper.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/com/apple/foundationdb/record/provider/foundationdb/SplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
.../java/com/apple/foundationdb/record/provider/foundationdb/VersioningSplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
.../java/com/apple/foundationdb/record/provider/foundationdb/VersioningSplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
...r-lucene/src/main/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueue.java
Show resolved
Hide resolved
...r-lucene/src/main/java/com/apple/foundationdb/record/lucene/directory/PendingWriteQueue.java
Show resolved
Hide resolved
|
|
||
| public static Stream<Arguments> testConfigs() { | ||
| return SplitHelperTestConfig.allValidConfigs().map(Arguments::of); | ||
| public static Stream<Arguments> testConfigsNoVersionInKey() { |
There was a problem hiding this comment.
It's a bit odd that the way this turned out, all of the tests wound up with the NoVersionInKey case. It kind of seems like what you'd want is for testConfigs to return the same set of configs as earlier, and then you could test out instances where the key has a version in separate cases in this file
There was a problem hiding this comment.
This was the original attempt. Things got complicated as the versionInKey required structural changes to the tests (save in one transaction, keep local and global version, construct actual key, assert in another transaction), which was incompatible with the structure of the test.
I think that a future enhancement (separate from the actual refactoring) would be to break the large sequential tests into smaller ones, where we could affect the way the assertions are made (one or two transactions) separately from the actual flow of the test logic.
...com/apple/foundationdb/record/provider/foundationdb/SplitHelperMultipleTransactionsTest.java
Outdated
Show resolved
Hide resolved
| public class SplitHelperMultipleTransactionsTest extends FDBRecordStoreTestBase { | ||
|
|
||
| // From the traditional nursery rhyme | ||
| private static final byte[] HUMPTY_DUMPTY = |
There was a problem hiding this comment.
I also noted the duplication. Given that the distinction seems to just be whether the methods use versionstamp or not, I think this could mostly just be combined with the other tests. I suppose a common helper could be extracted if we wanted to separate out the versionstamp and non-versionstamp tests for code organization purposes. But I don't think, for example, that the multiple transaction-ness of these causes much issue, if they lived in the same file as the others
...com/apple/foundationdb/record/provider/foundationdb/SplitHelperMultipleTransactionsTest.java
Show resolved
Hide resolved
...com/apple/foundationdb/record/provider/foundationdb/SplitHelperMultipleTransactionsTest.java
Show resolved
Hide resolved
|
I haven't totally gone through all of the tests, but I figured I'd add what I had before taking off for the day |
alecgrieser
left a comment
There was a problem hiding this comment.
There's some concern about the particulars of the API here, but I think the approach is sound, in that I don't believe it would cause trouble for existing data, and it does seem like it would enable large values into the PendingWriteQueue. Fixing up the API in a follow up seems acceptable
ScottDugas
left a comment
There was a problem hiding this comment.
We discussed offline that we should take in some of the api changes, and test improvements, but I have some comments on the most recent PR changes.
...e/src/main/java/com/apple/foundationdb/record/provider/foundationdb/SplitKeyValueHelper.java
Outdated
Show resolved
Hide resolved
...com/apple/foundationdb/record/provider/foundationdb/SplitHelperMultipleTransactionsTest.java
Outdated
Show resolved
Hide resolved
| @Nonnull | ||
| private TestDocument createHugeDocument() { | ||
| String hugeString = "Hello ".repeat(100_000); | ||
| Random random = ThreadLocalRandom.current(); |
There was a problem hiding this comment.
Probably better to use @RandomSeedSource, although I would be quite surprised if this test changes behavior with a different seed.
There was a problem hiding this comment.
The calling methods use @source already, not sure how to combine that with @RandomSeedSource
There was a problem hiding this comment.
The RandomSeedSource itself is based on RandomizedTestUtils.randomSeeds:
I think the only provider is a boolean provider, so creating a method arguments provider that returns something like:
static Stream<Arguments> seedAndUseCompression() {
return RandomizedTestUtils.randomSeeds(/* some array of fixed seeds */)
.flatMap(seed -> Stream.of(Arguments.of(seed, false), Arguments.of(seed, true));
}Would do it, I think
There was a problem hiding this comment.
Yeah, you can't use @RandomSeedSource, but there's actually a cleaner way to do that, you could do:
static Stream<Arguments> seedAndUseCompression() {
return ParameterizedTestUtils.cartesianProduct(
RandomizedTestUtils.randomSeeds(/* some array of fixed seeds */),
ParameterizedTestUtils.booleans("useCompression"));
}Using cartesianProduct just makes this code a bit cleaner (IMHO), using booleans means that the description of the arguments will be useCompression/!useCompression in our test logs/ides which is clearer than just true/false
|
|
||
| @Nonnull | ||
| private TestDocument createHugeDocument() { | ||
| String hugeString = "Hello ".repeat(100_000); |
There was a problem hiding this comment.
Is there any value in a different test that has a document that compresses well. I don't think there's any max size for SplitHelper, so I don't know that you could observe any change in behavior, other than trying to save something that is 10,000,000 bytes: https://apple.github.io/foundationdb/known-limitations.html but very compressible. I'm not sure we want a test using 10mb of memory just for this.
There was a problem hiding this comment.
I don't think there is a limit on the total record size, so there is no discernible difference between N splits and 500*N splits, AFAICT
There was a problem hiding this comment.
That limit, is the limit on the amount data you can have in a single transaction. So if you tried to save a 11,000,000 byte string of "repeat me" it would save successfully with compression, and fail if it just had the splits.
alecgrieser
left a comment
There was a problem hiding this comment.
LGTM. There are a few things in the tests, I suppose, but we could do that later. I do also think that there's an approach here that would let us remove any keys that were previously written within the same transaction, but I'm okay with punting that for a later improvement
| // an incomplete version in the key means we shouldn't delete previous values for the record (since they all have | ||
| // completed versions by now) |
There was a problem hiding this comment.
I don't think this is quite right. The keys only get associated with a complete version once they've been committed. Saying that we don't currently support updating records with incomplete versions in the key may be more accurate
| if (splitLongRecords || previousSizeInfo == null || previousSizeInfo.isVersionedInline()) { | ||
| if (splitKeyHelper.shouldClearBeforeWrite() && (splitLongRecords || previousSizeInfo == null || previousSizeInfo.isVersionedInline())) { | ||
| // Note that the clearPreviousSplitRecords also removes version splits from the context cache | ||
| // This is not currently supported for the case where we have versions in the keys since we can't trace the old values down |
There was a problem hiding this comment.
I think we do group all of the keys by their local version. The mutations currently go into a NavigableMap, which is sorted by unsigned byte order. Incomplete versions all get assigned 10 \xff bytes as their global version, and so they'll all be there in the cache in consecutive keys. If there's no use case for updating large values within the same transaction as they're included, then we could punt this, though
|
|
||
| private List<FDBRawRecord> writeDummyRecordsMultipleTransactions(boolean useVersionInKey) { | ||
| final byte[] valueVersion = "_cushions_".getBytes(StandardCharsets.US_ASCII); | ||
| // Generate primary keys using a generalization of the Fibonacci formula: https://oeis.org/A247698 |
There was a problem hiding this comment.
I believe I wrote the original version of this, and there's no reason besides it being a little more whimsical. That's also why it's _cushions_ (both are in oblique references to the same person, who is also referenced in the OEIS entry)
| @Nonnull | ||
| private TestDocument createHugeDocument() { | ||
| String hugeString = "Hello ".repeat(100_000); | ||
| Random random = ThreadLocalRandom.current(); |
There was a problem hiding this comment.
The RandomSeedSource itself is based on RandomizedTestUtils.randomSeeds:
I think the only provider is a boolean provider, so creating a method arguments provider that returns something like:
static Stream<Arguments> seedAndUseCompression() {
return RandomizedTestUtils.randomSeeds(/* some array of fixed seeds */)
.flatMap(seed -> Stream.of(Arguments.of(seed, false), Arguments.of(seed, true));
}Would do it, I think
| */ | ||
| @MethodSource("splitKeyEquivalenceCases") | ||
| @ParameterizedTest(name = "defaultHelperSplitKeyEquivalence[key={0}, index={1}]") | ||
| void defaultHelperSplitKeyEquivalence(Tuple key, long index) { |
There was a problem hiding this comment.
The name here is probably a bit confusing given that DefaultSplitKeyValueHelper no longer exists. Perhaps this test is no longer necessary?
|
|
||
| private Tuple toIncompleteKey(Tuple key, int localVersion, boolean versionInKey) { | ||
| if (versionInKey) { | ||
| return Tuple.from(Versionstamp.incomplete(localVersion)).addAll(key); |
There was a problem hiding this comment.
I don't have much concern, but it would probably be good to followup with some tests where the version is elsewhere in the key
| @Nonnull | ||
| private TestDocument createHugeDocument() { | ||
| String hugeString = "Hello ".repeat(100_000); | ||
| Random random = ThreadLocalRandom.current(); |
There was a problem hiding this comment.
Yeah, you can't use @RandomSeedSource, but there's actually a cleaner way to do that, you could do:
static Stream<Arguments> seedAndUseCompression() {
return ParameterizedTestUtils.cartesianProduct(
RandomizedTestUtils.randomSeeds(/* some array of fixed seeds */),
ParameterizedTestUtils.booleans("useCompression"));
}Using cartesianProduct just makes this code a bit cleaner (IMHO), using booleans means that the description of the arguments will be useCompression/!useCompression in our test logs/ides which is clearer than just true/false
|
|
||
| @Nonnull | ||
| private TestDocument createHugeDocument() { | ||
| String hugeString = "Hello ".repeat(100_000); |
There was a problem hiding this comment.
That limit, is the limit on the amount data you can have in a single transaction. So if you tried to save a 11,000,000 byte string of "repeat me" it would save successfully with compression, and fail if it just had the splits.
| if (current != null) { | ||
| // This should never happen. It means that the same key (and suffix) and local version were used for subsequent | ||
| // write. It is most likely not an intended flow and this check will protect against that. | ||
| // It is an incomplete check since the same record can have different suffixes to the primary key that would not collide. |
There was a problem hiding this comment.
| // It is an incomplete check since the same record can have different suffixes to the primary key that would not collide. | |
| // It is an incomplete check since the same record can have different suffixes to the primary key that would not collide. | |
| // Namely, if one is split, and one is not split they will not overlap. Anything else and the would. |
| * @param sizeInfo optional size information to populate | ||
| */ | ||
| @SuppressWarnings("PMD.CloseResource") | ||
| public static void saveWithSplit(@Nonnull final FDBRecordContext context, @Nonnull final Subspace subspace, |
There was a problem hiding this comment.
Does it make sense with this new api to support dryRun in the same way?
I don't think it should be done in this PR, and not really a priority, but there is some inconsistency that might be good resolving eventually.
This PR allows Pending Writes Queue entries to be split using the SplitHelper so that we can save large entries in the queue. It contains both the
SplitHelperwork as well as thePendingWriteQueuework.On the
SplitHelperthe work introduces an abstraction of the key construction and value writing in order to supportVersionInKey- the concept of having an incomplete version in the keys of the split records. There are two implementations:DefaultKeyValueSplitHelper(the current behavior) andVersioningKeyValueSplitHelper(the new one in support of the queue).Care was taken to ensure no changes occur to the existing behavior (for backwards compatibility). The original test
SplitHelperTestremains almost intact, while a new test,SplitHelperTestMultipleTransactionswas added to test the new functionality. This new test borrows heavily from the other test, with the intent of testing the behavior across multiple transactions (since we have to commit the first in order to complete the version in the key so that we can locate the entry).Two test cases,
saveWithSplitAndIncompleteVersionsanddeleteWithSplitAndVersionwere not copied over as they had no relevance for the newVersionInKey.This is considered a breaking change, as it changes the data structure of the Lucene index's pending writes queue. The index should be disabled before the deployment and re-enabled afterwards.
Resolves #3981