Skip to content

Add configurable date extraction from filenames #42

@michaelbeutler

Description

@michaelbeutler

Allow users to configure a date extraction pattern that parses a date from the original filename. The extracted date overrides $y, $m, $d (and $l) built-in variables. Falls back to current date if no match.

Example Config

settings:
  date_extraction:
    source: filename
    pattern: "(?P<year>[0-9]{4})(?P<month>[0-9]{2})(?P<day>[0-9]{2})"
    # or: "(?P<day>[0-9]{2})\\.(?P<month>[0-9]{2})\\.(?P<year>[0-9]{4})"

Implementation

Schema change:

  • File: crates/paporg/src/config/schema.rs
  • Add DateExtractionConfig struct with source, pattern fields
  • Add date_extraction: Option<DateExtractionConfig> to the settings/config struct

Variable engine change:

  • File: crates/paporg/src/config/variables.rs
  • In get_builtin_variables(): accept an optional override date
  • New function extract_date_from_filename(filename: &str, pattern: &str) -> Option<NaiveDate> that parses named groups year, month, day
  • If date is extracted, use it for $y, $m, $d, $l instead of Utc::now()

Pipeline change:

  • File: crates/paporg/src/pipeline/runner.rs
  • Before step_extract_variables(), attempt date extraction from filename
  • Pass extracted date (or None) to get_builtin_variables()

Acceptance Criteria

  • Configurable regex with named groups year, month, day
  • Extracted date overrides $y, $m, $d, $l
  • Falls back to current date when pattern doesn't match
  • Invalid dates (e.g., month 13) are rejected gracefully
  • Covered by unit tests with various date formats

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions