Skip to content

Conversation

@ryuwd
Copy link
Contributor

@ryuwd ryuwd commented Jan 21, 2026

Summary

Adds ReplicaCatalog, a Pydantic model for mapping Logical File Names (LFNs) to their physical replicas across distributed storage elements.

Depends on #746

Description

ReplicaCatalog provides a structured, validated representation of file replica information intended to be stored in JSON format. It serves as a more user-friendly replacement for Pool XML Catalog.

Key features:

  • LFN/PFN validation: Automatic stripping of LFN: and PFN: prefixes, with validation
  • Checksum support: Adler-32 (8 hex chars) and GUID (UUID format) checksums with format validation
  • Storage element tracking: Each replica is associated with a storage element identifier
  • Optional metadata: File size in bytes and checksum information

Example usage:

from diracx.core.replica_catalog import ReplicaCatalog

catalog = ReplicaCatalog(root={
    "/lhcb/MC/2024/file.dst": {
        "replicas": [
            {"url": "https://storage1.cern.ch/file.dst", "se": "CERN-DST"},
            {"url": "https://storage2.in2p3.fr/file.dst", "se": "IN2P3-DST"},
        ],
        "size_bytes": 1048576,
        "checksum": {"adler32": "788c5caa"},
    }
})

@ryuwd ryuwd changed the title feat (core): added replica catalog to core feat (core): added replica catalog Jan 21, 2026
@read-the-docs-community
Copy link

read-the-docs-community bot commented Jan 21, 2026

Documentation build overview

📚 diracx | 🛠️ Build #31116593 | 📁 Comparing 9506a40 against latest (ccb1f48)


🔍 Preview build

No files changed.

@chrisburr
Copy link
Member

I think we need to decide on the layout of core as replica_catalogue feels to specific for the top level

@aldbr
Copy link
Contributor

aldbr commented Jan 22, 2026

I think we need to decide on the layout of core as replica_catalogue feels to specific for the top level

I guess that, at some point (may be now?), it would make sense to have a core/models directory that would contain a module per type of models (instead of having all our models in the same models.py module). Example:

  • core/models/:
    • auth.py
    • jobs.py
    • sandbox.py
    • metadata.py
    • search.py
    • and replica_catalog.py

Any opinion?

@fstagni
Copy link
Contributor

fstagni commented Jan 22, 2026

I think we need to decide on the layout of core as replica_catalogue feels to specific for the top level

I guess that, at some point (may be now?), it would make sense to have a core/models directory that would contain a module per type of models (instead of having all our models in the same models.py module). Example:

  • core/models/:

    • auth.py
    • jobs.py
    • sandbox.py
    • metadata.py
    • search.py
    • and replica_catalog.py

Any opinion?

It makes sense to me

@ryuwd ryuwd changed the title feat (core): added replica catalog feat (core): added replica catalog / refactor: diracx.core.models into a package Jan 22, 2026
@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch 3 times, most recently from 6a595bb to 4af54f4 Compare January 22, 2026 13:07
@fstagni
Copy link
Contributor

fstagni commented Jan 22, 2026

I would split the refactoring in a separate PR

@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch from 4af54f4 to fd28f0f Compare January 22, 2026 13:21
@ryuwd ryuwd changed the title feat (core): added replica catalog / refactor: diracx.core.models into a package feat (core): added replica catalog Jan 22, 2026
@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch from 04dd4d1 to 1ea1fb2 Compare January 22, 2026 13:26
@ryuwd
Copy link
Contributor Author

ryuwd commented Jan 22, 2026

Will rebase after #746

@aldbr aldbr linked an issue Jan 22, 2026 that may be closed by this pull request
1 task
ryuwd added 5 commits January 23, 2026 08:59
test: add unit tests for replica catalog model

fix: forrmatting

docs: replica catalog explanations

fix: more linting messages, refactoring needed
@chrisburr chrisburr force-pushed the roneil-replica-catalog-json branch from 1ea1fb2 to 9506a40 Compare January 23, 2026 07:59
@chaen
Copy link
Contributor

chaen commented Jan 23, 2026

Historically, the DFC (before LFC) has always been referred to as the Replica Catalog. Do you think you could come up with an alternative name (reflecting for example that it's a local file) ? Otherwise it's not dramatic :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Implement a "JSON catalog" input data resolution format

5 participants