feat: add --component-map path-prefix backend for package-database-free rootfs#111
feat: add --component-map path-prefix backend for package-database-free rootfs#111hanthor wants to merge 1 commit intocoreos:mainfrom
Conversation
Add a new `pathmap` component repo that assigns files to components
based on path-prefix rules loaded from a JSON file. This is useful for
build systems like BuildStream (used by GNOME OS) that produce rootfs
images without a package database, making it impossible to use the RPM
or deb backends.
The file format is a JSON array of objects with three fields:
- `prefix` (string, required): absolute path prefix to match
- `component` (string, required): component name to assign
- `interval` (string, optional): "daily", "weekly", or "monthly"
(default); controls the packing stability weight
Rules are evaluated in order; the first matching prefix wins. Claims
from this repo are weak, so xattr and package-database repos take
precedence when present.
The feature is exposed via a new `--component-map PATH` flag on the
`build` subcommand. The error message shown when no repo is found is
updated to mention this flag as a fallback option.
Stability is modelled as a Poisson process: P(no change in T days) =
e^(-T/period_days). This keeps all intervals strictly ordered and
non-zero regardless of the stability window size.
Assisted-by: Claude Sonnet 4.6
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a path-prefix component mapping feature, enabling file assignment to components via a JSON configuration when package databases are absent. It adds a --component-map CLI argument and implements PathmapRepo to handle prefix-based claims with stability weights. Reviewer feedback suggests optimizing memory and performance by replacing the pre-computed path-to-component HashMap with lazy rule evaluation in the claim pass, thereby avoiding redundant file iterations and reducing memory consumption for large filesystems.
| pub struct PathmapRepo { | ||
| /// Component names, indexed by ComponentId. | ||
| components: IndexSet<String>, | ||
| /// Per-component metadata (stability, etc.). | ||
| component_meta: Vec<ComponentMeta>, | ||
| /// Pre-computed path → ComponentId map. | ||
| path_to_component: HashMap<Utf8PathBuf, ComponentId>, | ||
| default_mtime_clamp: u64, | ||
| } |
There was a problem hiding this comment.
The path_to_component map can lead to significant memory overhead for large rootfs images, as it clones and stores every matching path. It is more efficient to store the rules and evaluate them lazily during the claim pass. This avoids duplicating path strings that are already present in the FileMap.
| pub struct PathmapRepo { | |
| /// Component names, indexed by ComponentId. | |
| components: IndexSet<String>, | |
| /// Per-component metadata (stability, etc.). | |
| component_meta: Vec<ComponentMeta>, | |
| /// Pre-computed path → ComponentId map. | |
| path_to_component: HashMap<Utf8PathBuf, ComponentId>, | |
| default_mtime_clamp: u64, | |
| } | |
| pub struct PathmapRepo { | |
| /// Component names, indexed by ComponentId. | |
| components: IndexSet<String>, | |
| /// Per-component metadata (stability, etc.). | |
| component_meta: Vec<ComponentMeta>, | |
| /// Path-prefix rules. | |
| rules: Vec<(Utf8PathBuf, ComponentId)>, | |
| default_mtime_clamp: u64, | |
| } |
| pub fn load(map_path: &Utf8Path, files: &FileMap, default_mtime_clamp: u64) -> Result<Self> { | ||
| let content = std::fs::read_to_string(map_path) | ||
| .with_context(|| format!("reading path-map file {map_path}"))?; | ||
| let entries: Vec<PathMapEntry> = serde_json::from_str(&content) | ||
| .with_context(|| format!("parsing path-map file {map_path}"))?; | ||
|
|
||
| let mut components: IndexSet<String> = IndexSet::new(); | ||
| let mut component_meta: Vec<ComponentMeta> = Vec::new(); | ||
| let mut path_to_component: HashMap<Utf8PathBuf, ComponentId> = HashMap::new(); | ||
|
|
||
| // Pre-intern all component names and their metadata so that we can look | ||
| // them up by ComponentId during the main loop without re-computing the | ||
| // interval → stability mapping for every file. | ||
| let mut entry_ids: Vec<ComponentId> = Vec::with_capacity(entries.len()); | ||
| for entry in &entries { | ||
| let stability = entry.interval.to_stability(); | ||
| let (idx, inserted) = components.insert_full(entry.component.clone()); | ||
| if inserted { | ||
| component_meta.push(ComponentMeta { stability }); | ||
| } | ||
| entry_ids.push(ComponentId(idx)); | ||
| } | ||
|
|
||
| for file_path in files.keys() { | ||
| for (entry, &comp_id) in entries.iter().zip(entry_ids.iter()) { | ||
| let prefix = Utf8Path::new(&entry.prefix); | ||
| if file_path.starts_with(prefix) { | ||
| path_to_component.insert(file_path.clone(), comp_id); | ||
| break; // first matching rule wins | ||
| } | ||
| } | ||
| } | ||
|
|
||
| tracing::debug!( | ||
| path = %map_path, | ||
| rules = entries.len(), | ||
| components = components.len(), | ||
| paths = path_to_component.len(), | ||
| "loaded pathmap components" | ||
| ); | ||
|
|
||
| Ok(Self { | ||
| components, | ||
| component_meta, | ||
| path_to_component, | ||
| default_mtime_clamp, | ||
| }) | ||
| } |
There was a problem hiding this comment.
The load function currently iterates over all files in the rootfs to pre-compute claims. This is redundant because claim_pass already iterates over files and calls weak_claims_for_path. By moving the matching logic to weak_claims_for_path, we avoid processing files that are already claimed by higher-priority repositories (like xattr or rpm).
Additionally, this implementation merges stability values for duplicate component names by taking the minimum (most conservative) value, and ensures prefixes are treated as absolute paths for consistent matching.
pub fn load(map_path: &Utf8Path, _files: &FileMap, default_mtime_clamp: u64) -> Result<Self> {
let content = std::fs::read_to_string(map_path)
.with_context(|| format!("reading path-map file {map_path}"))?;
let entries: Vec<PathMapEntry> = serde_json::from_str(&content)
.with_context(|| format!("parsing path-map file {map_path}"))?;
let mut components: IndexSet<String> = IndexSet::new();
let mut component_meta: Vec<ComponentMeta> = Vec::new();
let mut rules: Vec<(Utf8PathBuf, ComponentId)> = Vec::with_capacity(entries.len());
for entry in entries {
let stability = entry.interval.to_stability();
let (idx, inserted) = components.insert_full(entry.component);
if inserted {
component_meta.push(ComponentMeta { stability });
} else {
// Merge stability: use the minimum (most conservative) value
component_meta[idx].stability = component_meta[idx].stability.min(stability);
}
let mut prefix = entry.prefix;
if !prefix.starts_with('/') {
prefix.insert(0, '/');
}
rules.push((Utf8PathBuf::from(prefix), ComponentId(idx)));
}
tracing::debug!(
path = %map_path,
rules = rules.len(),
components = components.len(),
"loaded pathmap components"
);
Ok(Self {
components,
component_meta,
rules,
default_mtime_clamp,
})
}| fn weak_claims_for_path( | ||
| &self, | ||
| _rootfs: &Dir, | ||
| path: &Utf8Path, | ||
| _file_info: &super::FileInfo, | ||
| ) -> Result<Vec<ComponentId>> { | ||
| Ok(self | ||
| .path_to_component | ||
| .get(path) | ||
| .map(|id| vec![*id]) | ||
| .unwrap_or_default()) | ||
| } |
There was a problem hiding this comment.
Evaluate rules lazily to avoid the memory overhead of a pre-computed path map. This also ensures that we only perform prefix matching for files that haven't been claimed by higher-priority repositories.
| fn weak_claims_for_path( | |
| &self, | |
| _rootfs: &Dir, | |
| path: &Utf8Path, | |
| _file_info: &super::FileInfo, | |
| ) -> Result<Vec<ComponentId>> { | |
| Ok(self | |
| .path_to_component | |
| .get(path) | |
| .map(|id| vec![*id]) | |
| .unwrap_or_default()) | |
| } | |
| fn weak_claims_for_path( | |
| &self, | |
| _rootfs: &Dir, | |
| path: &Utf8Path, | |
| _file_info: &super::FileInfo, | |
| ) -> Result<Vec<ComponentId>> { | |
| for (prefix, id) in &self.rules { | |
| if path.starts_with(prefix) { | |
| return Ok(vec![*id]); | |
| } | |
| } | |
| Ok(vec![]) | |
| } |
|
dakota component-map example (source) — 127 rules covering locale-data, firmware, kernel, fonts, ghostty, wallpapers, docs, python3, and more. |
|
Benchmark: Dakota (7.4 GiB, 197k files, 120 layers)
Note: BST xattrs are stripped on OCI export — xattr repo returns 0 matches on exported images. |
|
Closing in favour of #113. The path-prefix map approach works but covers only 81% of files and requires manually maintaining rules. Once #113 is resolved (libc fallback for xattr reads), an LD_PRELOAD sidecar can serve the full file→element map generated by |
Closes #110 (partial — see also #112).
Adds
--component-map PATHand apathmapbackend: a JSON file of path-prefix → component rules, for rootfs images with no package DB and no xattrs.Tested on Dakota (7.4 GiB, 197k files, 120 layers): 81% file coverage, 54 components, 22.1 s ±1.29 s.
All 58 tests pass,
just clippyclean.Assisted-by: Claude Sonnet 4.6