Skip to content

Hardcoded Extraction Marker Detection (MVP Tech Debt) #32

@WesleyMFrederick

Description

@WesleyMFrederick

Problem

MarkdownParser hardcodes detection of extraction markers (%% %% and <!-- --> delimiters) after links for the content extraction feature. This creates coupling between the parser (generic markdown processing) and a specific feature (content extraction).

Coupling Issues

  • Parser knows about extraction-specific marker syntax
  • Cannot reuse parser in contexts where extraction markers are irrelevant
  • Adding new marker types requires modifying parser internals
  • Violates Single Responsibility Principle (parser handles both link detection AND feature-specific annotation)

Current Implementation (MVP)

// Parser hardcodes extraction marker detection
const extractionMarkerMatch = remainingLine.match(/\s*(%%(.+?)%%|<!--\s*(.+?)\s*-->)/);
if (extractionMarkerMatch) {
  linkObject.extractionMarker = {
    fullMatch: extractionMarkerMatch[1],
    innerText: extractionMarkerMatch[2] || extractionMarkerMatch[3]
  };
}

Future Enhancement

Make parser extensible by accepting custom parsing rules/patterns as configuration:

// Future: Parser accepts custom annotation detectors
const parser = new MarkdownParser({
  annotationDetectors: [
    { name: 'extractionMarker', pattern: /\s*(%%(.+?)%%|<!--\s*(.+?)\s*-->)/, scope: 'after-link' },
    { name: 'customTag', pattern: /<tag>(.+?)<\/tag>/, scope: 'wrapper' }
  ]
});

Benefits of Future Approach

  • Parser stays generic and reusable
  • Features register their own annotation patterns
  • No parser modifications needed for new features
  • Clear separation: parser provides extensibility hooks, features provide patterns

Recommendation

Priority: Low (MVP trade-off accepted; address when multiple features need custom markdown annotations or when parser reuse is required)


Discovery Date: 2025-10-20
Discovered During: ContentExtractor implementation guide development
Source: Markdown Parser Implementation Guide - Issue 5

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions