Skip to content

Windows support — OCR pipeline, packaging, and foreground metadata #24

@JTLR

Description

@JTLR

Hey — I've been using Familiar daily on Windows and built out a full Windows port on a fork. Wanted to gauge interest before opening PRs.

What's included:

  • Windows OCR pipeline using the built-in Windows.Media.Ocr API (WinRT via PowerShell 5.1) — mirrors the Apple Vision OCR pattern with layout region inference (HEADER/SIDEBAR/CONTENT from word bounding boxes)
  • Foreground window metadata capture — captures active window title, app name, PID, and URL extraction for browsers at screenshot time. This addresses Capture ground-truth app/window/URL metadata per frame #11 and complements the on-screen app detector added recently (which is macOS-only)
  • NSIS installer packagingnpm run dist:win builds a Windows .exe installer with PowerShell scripts properly unpacked via extraResources
  • Cross-platform Launch at Login — replaces the macOS-only openAtLogin with a cross-platform setting that works on both platforms
  • Platform-aware defaults — capture interval, UI text ("System Tray" vs "Dock"), extractor type auto-detection, settings normalisation for windows_ocr type

The OCR quality is solid for screen text — I've been running it daily for a few weeks. All output follows the familiar-layout-v0 markdown format with real frontmatter metadata (app, window title, URL).

Happy to break this into focused PRs if you're interested. Thinking something like:

  1. Core Windows OCR + packaging
  2. Cross-platform Launch at Login + capture interval
  3. README updates

Fork: https://github.com/JTLR/familiar/tree/windows-support

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions