Skip to content

fix: recover from corrupt persistence files instead of crashing hydration#38

Merged
danReynolds merged 10 commits into
mainfrom
claude/persistence-fault-recovery
May 31, 2026
Merged

fix: recover from corrupt persistence files instead of crashing hydration#38
danReynolds merged 10 commits into
mainfrom
claude/persistence-fault-recovery

Conversation

@danReynolds

Copy link
Copy Markdown
Owner

Summary

Makes file-persistence hydration resilient to corrupt or undecryptable files, and adds fault-injection tests.

FileDataStoreConfig and FileDataStoreResolverConfig caught only PathNotFoundException on hydrate. A corrupt JSON file (or one that fails to decrypt) threw a FormatException that propagated out of the worker isolate's hydrate handler — which crashes the worker and fails hydration of the entire store. So one unreadable partition bricked startup for all persisted data.

The revert check made the severity concrete: against the unfixed code the test output is

Exception: Request error: FormatException: Unexpected character (at character 1)
Exception: Worker isolate exited unexpectedly

Fix

Catch the decode/decrypt failure on hydrate, move the unreadable file aside to <path>.corrupt, and recover with an empty store for that partition:

  • Other partitions hydrate normally — a single bad file no longer takes down the whole store.
  • The corrupt data is preserved for inspection (.corrupt), and the .corrupt suffix isn't matched by the data store file listing, so it won't be re-loaded.
  • The next persist for that partition writes a fresh file (atomic write lands separately in fix: write persistence files atomically #34).

Behaviour choice

This favours startup resilience over surfacing the error — the right default for a local cache (better to start with one partition empty than not start at all). The failure is reported via the persistor's logger. If louder signalling is wanted (e.g. an onHydrationError callback), that's a easy follow-up — flagging it for your call.

Test plan

  • Fault-injection test: hydrate with a valid store + a corrupt sibling store → hydration recovers, valid data loads, corrupt file quarantined
  • Fault-injection test: hydrate with a corrupt resolver file → recovers (full hydration doesn't depend on the resolver)
  • Confirmed both tests fail against the unfixed code (worker isolate crashes)
  • flutter test test/native — 86 tests green
  • flutter analyze clean on changed files

Generated by Claude Code

…tion

FileDataStoreConfig and the resolver config caught only
PathNotFoundException when hydrating. A corrupt JSON file or a file that
fails to decrypt threw a FormatException that propagated out of the
worker isolate's hydrate handler — crashing the worker and failing
hydration of the entire store, so one unreadable partition bricked
startup for all data.

Catch the decode/decrypt failure, move the unreadable file aside to
<path>.corrupt (preserved for inspection, ignored by the data store file
listing), and recover with an empty store for that partition. Other
partitions hydrate normally and the next persist writes a fresh file.
This favours startup resilience for a local cache; the failure is
surfaced via the logger.

Adds fault-injection tests that hydrate alongside a corrupt data store
file and a corrupt resolver file, asserting hydration recovers (valid
data still loads, the bad file is quarantined). Confirmed both tests
fail against the unfixed code (the worker isolate crashes).
Copilot AI review requested due to automatic review settings May 26, 2026 18:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes file-based persistence hydration resilient to corrupt or undecryptable persistence files by quarantining unreadable files as *.corrupt and continuing hydration, backed by new fault-injection tests to prevent worker-isolate crashes during startup.

Changes:

  • Add corrupt-file recovery during hydration for both data stores and the resolver (rename to <path>.corrupt, then recover with an empty store).
  • Add native tests that inject corrupt JSON for a sibling store and for the resolver, asserting hydration still succeeds and corrupt files are quarantined.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
lib/persistor/file_persistor/file_data_store_config.dart Catch hydration failures, quarantine unreadable files, and continue hydration instead of crashing the worker isolate.
test/native/file_persistor_test.dart Add fault-injection tests covering corrupt store files and corrupt resolver file recovery behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 52 to 57
} on PathNotFoundException {
return null;
} catch (error) {
await _recoverCorruptFile(file, logger, error);
return null;
}
Comment on lines +87 to +89
} catch (error) {
await _recoverCorruptFile(file, logger, error);
return null;
Comment on lines +12 to +26
Future<void> _recoverCorruptFile(
File file,
Logger logger,
Object error,
) async {
logger.log('Failed to hydrate ${file.path}, quarantining as corrupt: $error');
try {
final quarantine = File('${file.path}.corrupt');
if (await quarantine.exists()) {
await quarantine.delete();
}
await file.rename(quarantine.path);
} catch (e) {
logger.log('Failed to quarantine ${file.path}: $e');
}
Comment thread test/native/file_persistor_test.dart Outdated
group('FilePersistor fault recovery', () {
final loonDir = Directory('${testDirectory.path}/loon');

// FlutterSecureStorage does not work in tests, so supply a fixed encrypter.
);
} on PathNotFoundException {
return null;
} on FileSystemException catch (error, stackTrace) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want these for the resolver. if the resolver fails we're in big trouble and show probably be blocking?

import '../../models/test_user_model.dart';
import '../../utils.dart';

void _flushPersistManagerOperation(FakeAsync async) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed and we use the shared flush util?

Comment thread test/core/loon_test.dart Outdated
@@ -15,8 +16,9 @@ void main() {
});

tearDown(() async {
await Future<void>.delayed(Duration.zero);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this hack still needed?

} on FormatException catch (error, stackTrace) {
await _recoverCorruptFile(file, logger, error, stackTrace);
return null;
} on InvalidCipherTextException catch (error, stackTrace) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we including a new package just for this type?

@danReynolds danReynolds merged commit 18f8902 into main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants