Skip to content

Add CLI commands for schema validation #17

@MALathon

Description

@MALathon

Summary

Add CLI commands to validate schemas and check their health status.

New CLI Arguments

--validate-schemas       Validate all schemas and show health status
--validate-schema NAME   Validate a specific schema
--validation-output FMT  Output format: 'text' (default), 'json'

Implementation

# cli.py
parser.add_argument(
    '--validate-schemas',
    action='store_true',
    help='Validate all schemas and show health status'
)

parser.add_argument(
    '--validate-schema',
    type=str,
    metavar='NAME',
    help='Validate a specific schema by name'
)

parser.add_argument(
    '--validation-output',
    type=str,
    choices=['text', 'json'],
    default='text',
    help='Output format for validation results (default: text)'
)

Main Function

def main(argv=None):
    args = parser.parse_args(argv)
    
    # Handle validation commands
    if args.validate_schemas:
        return validate_schemas_command(args.validation_output)
    
    if args.validate_schema:
        return validate_schema_command(args.validate_schema, args.validation_output)
    
    # ... rest of main ...

def validate_schemas_command(output_format: str) -> int:
    from fetcharoo.schemas import validate_all_schemas
    
    results = validate_all_schemas()
    
    if output_format == 'json':
        import json
        output = {name: {
            'status': h.status,
            'found_pdfs': h.found_pdfs,
            'expected_pdfs': h.expected_pdfs,
            'error': h.error
        } for name, h in results.items()}
        print(json.dumps(output, indent=2))
    else:
        # Text output with status icons
        status_icons = {'healthy': '✓', 'degraded': '⚠', 'broken': '✗'}
        for name, health in sorted(results.items()):
            icon = status_icons[health.status]
            if health.error:
                print(f"{icon} {name}: {health.status} - {health.error}")
            else:
                print(f"{icon} {name}: {health.status} ({health.found_pdfs}/{health.expected_pdfs} PDFs)")
    
    # Return non-zero if any broken
    broken = [h for h in results.values() if h.status == 'broken']
    return 1 if broken else 0

def validate_schema_command(name: str, output_format: str) -> int:
    from fetcharoo.schemas import get_schema
    
    schema = get_schema(name)
    if not schema:
        print(f"Error: Unknown schema '{name}'", file=sys.stderr)
        return 1
    
    health = schema.validate()
    # ... similar output logic ...

Usage Examples

# Validate all schemas
$ fetcharoo --validate-schemas
✓ springer_book: healthy (12/5 PDFs)
✓ arxiv: healthy (1/1 PDFs)
⚠ some_site: degraded (2/5 PDFs)
✗ broken_site: broken - Connection refused

# Validate specific schema
$ fetcharoo --validate-schema springer_book
✓ springer_book: healthy (12/5 PDFs)

# JSON output (for CI/scripts)
$ fetcharoo --validate-schemas --validation-output json
{
  "springer_book": {
    "status": "healthy",
    "found_pdfs": 12,
    "expected_pdfs": 5,
    "error": null
  },
  ...
}

# Use in CI to fail on broken schemas
$ fetcharoo --validate-schemas || echo "Some schemas are broken!"

Exit Codes

  • 0: All schemas healthy or degraded
  • 1: Any schema is broken

Tasks

  • Add --validate-schemas argument
  • Add --validate-schema NAME argument
  • Add --validation-output argument
  • Implement text output with status icons
  • Implement JSON output
  • Return appropriate exit codes
  • Add CLI tests

Acceptance Criteria

  • --validate-schemas shows all schema health
  • --validate-schema NAME validates single schema
  • Text output uses clear status icons (✓ ⚠ ✗)
  • JSON output is parseable for CI
  • Returns exit code 1 if any broken

Dependencies

Part of

Parent issue: #10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions