Skip to content

Feature metadata serialization#305

Open
biqar wants to merge 16 commits intohpc-io:developfrom
biqar:feature_metadata-serialization
Open

Feature metadata serialization#305
biqar wants to merge 16 commits intohpc-io:developfrom
biqar:feature_metadata-serialization

Conversation

@biqar
Copy link
Copy Markdown

@biqar biqar commented Mar 17, 2026

Related Issues / Pull Requests

Related Issue: 282

Description

Integrate BULKI serializer for pdc server checkpointing.

What changes are proposed in this pull request?

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected; for instance, examples in this repository must be updated too)
  • This change requires a documentation update

Checklist:

  • My code modifies existing public API, or introduces new public API, and I updated or wrote docstrings
  • I have commented my code
  • My code requires documentation updates, and I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Added code to add kv-tags at scale. Also added code to get and verify tags at scale. The later test code will be used to verify the metadata checkpointing after a server restart.
…nd from file instead of using a buffer

Also added version control for the bulki-based checkpointing.
@biqar biqar requested a review from a team as a code owner March 17, 2026 19:01
@biqar
Copy link
Copy Markdown
Author

biqar commented Mar 26, 2026

Benchmark Summary (10000 objects, 100 tags)

Serial

Metric Previous New (BUKLI)
Close Time (s) 1.78 5.34
Restart Time (s) 2.09 0.21

Parallel (-N 1 -n 5 -c 2)

Metric Previous New (BUKLI)
Close Time (s) 0.85 0.45
Restart Time (s) 0.77 0.05

@biqar
Copy link
Copy Markdown
Author

biqar commented Mar 26, 2026

@houjun and @jeanbez Please let me know the further steps to merge the PR.

@biqar biqar mentioned this pull request Mar 26, 2026
@houjun
Copy link
Copy Markdown
Member

houjun commented Mar 30, 2026

@biqar a couple of things:

  • Add your new checkpoint/restart test to run in cmake tests, the change in src/tests/CMakeLists.txt only compiles them
  • for benchmarking results, can you run with more variations in the number of obj/kvtag, and for each variation, run it at least 5 times and report the min, avg, and max?

@biqar
Copy link
Copy Markdown
Author

biqar commented Apr 2, 2026

Todos:

  1. more rigorous testing
a. Strong scaling (fix the total amount of work and then change the number of workers)
Fix work to 100K/1000 and then changing the number of parallel servers:
-n 2 -c 2
-n 4 -c 2
-n 8 -c 2
-n 16 -c 2

b. Weak scaling (fix the amount of workload per-worker)
10K/100 -n 2 -c 2 will implies in respect to checkpointing (10k + 10k*100) = 1010000
20K/100 -n 4 -c 2 will implies in respect to checkpointing (20k + 20k*100) = 2020000
40K/100 -n 8 -c 2 will implies in respect to checkpointing (40k + 40k*100) = 4040000
80K/100 -n 16 -c 2 will implies in respect to checkpointing (80k + 80k*100) = 8080000
  1. Add new checkpoint/restart test to run in cmake tests. Create a new version of run_checkpoint_restart_test.sh to accept the program parameters. Then add another version for mpi testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants