Skip to content

UI automation to get the Direct Text#630

Merged
TheJoeFin merged 14 commits intodevfrom
ui-automation
Mar 9, 2026
Merged

UI automation to get the Direct Text#630
TheJoeFin merged 14 commits intodevfrom
ui-automation

Conversation

@TheJoeFin
Copy link
Copy Markdown
Owner

This introduces a new feature where UI automation tools grab the text directly from the UI element if able and if not it will do an OCR as fall back.

Added a new LanguageKind.UiAutomation to HistoryInfo and implemented the UiAutomationLang class for "UI Automation Text" language support. Introduced UiAutomationOptions record to configure UI Automation traversal and filtering behavior.
Added support for UI Automation as a selectable OCR language. Integrated UiAutomationLang into language selection, caching, and kind/type checks. Introduced UIAutomationUtilities for extracting text from screen regions, points, and windows using Windows UI Automation APIs. Updated OcrUtilities to route requests to UIAutomationUtilities when appropriate, with fallback logic to traditional OCR. Added CaptureLanguageUtilities for language enumeration and compatibility checks. Improved settings import/export robustness to handle property-based settings. These changes enable text extraction from UI elements as an alternative to image-based OCR.
Added user-configurable settings and UI controls for UI Automation text extraction, including toggles for enabling UI Automation, fallback to OCR, traversal mode, offscreen element inclusion, and focus preference. Updated language picker to use OCR language by default and persist selection. Improved language selection experience and settings persistence.
- Introduce UI Automation as a new OCR language mode, including traversal options.
- Centralize language loading and selection logic using CaptureLanguageUtilities.
- Unify language dropdown population for all OCR modes (Tesseract, Windows AI, UI Automation).
- Update UI to reflect table output support based on selected language.
- Invalidate OCR language cache on language reset for accurate UI updates.
- Track static vs. live image sources in GrabFrame; notify user if UI Automation is selected with a static image.
- Update OCR logic to use UI Automation APIs when appropriate; skip image-based corrections for UI Automation.
- Refactor and simplify code for better maintainability and clarity.
Expanded test coverage for CaptureLanguageUtilities and UIAutomationUtilities, including language matching, selection, table output support, text normalization, deduplication, window selection logic, control type handling, and point sampling. Also added tests for UiAutomationLang handling in LanguageService and HistoryInfo.
- Introduce UiAutomationOverlayItem/Snapshot models and enum for overlay representation and metadata.
- Add overlay extraction methods to UIAutomationUtilities, including deduplication, sorting, and metadata helpers.
- Support overlay snapshot extraction for regions, with optional window exclusion.
- Refine region/point text extraction to handle excluded windows and improve accuracy with overlays.
- Improve element text extraction: restrict Name fallback to specific control types and skip if visible text descendants exist.
- Add ImageSource-to-Bitmap conversion and utility for live UIA source requirement.
- Refactor history service to better handle image paths and deduplication.
Expanded test coverage for CaptureLanguageUtilities and UIAutomationUtilities, including new tests for RequiresLiveUiAutomationSource, TryClipBounds, TryAddUniqueOverlayItem, and SortOverlayItems. Added ImageMethodsTests to verify ImageSourceToBitmap behavior. Updated ShouldUseNameFallback tests and improved using directives.
Enables rendering of UI Automation overlays in the GrabFrame window, allowing users to view and interact with detected UI elements when a UI Automation language is selected. Adds logic to capture overlays, render them as word borders, and fall back to OCR when overlays are unavailable. Introduces user feedback messaging for unsupported scenarios, improves language selection synchronization, and refactors word border management. Updates XAML to include a message border for user notifications. Also fixes bitmap handling and ensures robust state management when switching between live and static image modes.
Update all references from "UI Automation" to "Direct Text" in both code and UI. This includes changing the abbreviated name to "DT" and updating display, native, and culture names in UiAutomationLang. Adjust UI labels, descriptions, and toggle switches in LanguageSettings.xaml to reflect the new terminology. No functional changes, only terminology updates for clarity.
@TheJoeFin TheJoeFin added enhancement New feature or request General Processing Relating to the processing of images to some type of text output labels Mar 8, 2026
@TheJoeFin TheJoeFin changed the base branch from main to dev March 8, 2026 01:07
@TheJoeFin TheJoeFin requested a review from Copilot March 8, 2026 01:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new “Direct Text” capture mode backed by UI Automation, allowing Text Grab to read accessible UI text from live controls when available, with configurable fallback to OCR when it isn’t.

Changes:

  • Introduces UI Automation-based text extraction + overlay snapshot rendering (with settings for traversal/offscreen/focus preference and OCR fallback).
  • Updates Grab Frame and Fullscreen Grab to route capture and overlays through Direct Text when that language is selected.
  • Centralizes language list/selection persistence logic (including legacy persisted language matching) and adds unit tests for the new utilities.

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Text-Grab/Views/GrabFrame.xaml.cs Adds Direct Text rendering paths, live/static-source handling, and message UI logic.
Text-Grab/Views/GrabFrame.xaml Adds an in-frame message border for non-blocking status/errors.
Text-Grab/Views/FullscreenGrab.xaml.cs Moves language persistence/state (table support) to shared utilities.
Text-Grab/Views/FullscreenGrab.SelectionStyles.cs Captures UIA snapshot for GrabFrame and supports Direct Text for region grabs.
Text-Grab/Views/EditTextWindow.xaml.cs Persists selected language and avoids culture-setting for non-Global languages.
Text-Grab/Utilities/UIAutomationUtilities.cs New UIA extraction and overlay snapshot implementation.
Text-Grab/Utilities/SettingsImportExportUtilities.cs Adds reflection fallback for settings import/export and adjusts type conversion flow.
Text-Grab/Utilities/OcrUtilities.cs Adds UIA-first capture paths with OCR fallback and excluded-handle support.
Text-Grab/Utilities/ImageMethods.cs Adds ImageSource→Bitmap helper used by history/save flows.
Text-Grab/Utilities/CaptureLanguageUtilities.cs New shared language list/persistence helpers and UIA compatibility checks.
Text-Grab/Services/LanguageService.cs Adds UiAutomation language kind/tag handling and cache invalidation behavior.
Text-Grab/Services/HistoryService.cs Adjusts history overwrite/save flow to preserve/assign image paths more robustly.
Text-Grab/Properties/Settings.settings Adds user settings for enabling/configuring Direct Text behavior.
Text-Grab/Properties/Settings.Designer.cs Generated settings accessors for new Direct Text settings.
Text-Grab/Pages/LanguageSettings.xaml.cs Adds UI for Direct Text settings and persists them to user settings.
Text-Grab/Pages/LanguageSettings.xaml Adds Direct Text configuration section (toggles + traversal mode).
Text-Grab/Models/UiAutomationOverlaySnapshot.cs New model for storing a captured UIA overlay snapshot.
Text-Grab/Models/UiAutomationOverlayItem.cs New model for individual UIA overlay items + source classification.
Text-Grab/Models/UiAutomationOptions.cs New model for UIA traversal/filter options.
Text-Grab/Models/UiAutomationLang.cs New ILanguage implementation representing “Direct Text”.
Text-Grab/Models/HistoryInfo.cs Rehydrates UiAutomation language kind when reading history.
Text-Grab/Enums.cs Adds LanguageKind.UiAutomation and UiAutomationTraversalMode.
Text-Grab/Controls/LanguagePicker.xaml.cs Persists selected language and uses persisted OCR language as initial selection.
Text-Grab/Controls/LanguagePicker.xaml Generalizes item template for ILanguage-backed items.
Text-Grab/App.config Adds default values for Direct Text settings (and includes FsgSelectionStyle).
Tests/UiAutomationUtilitiesTests.cs Unit tests for UIA helper logic (normalize/dedup/sorting/window selection).
Tests/LanguageServiceTests.cs Adds coverage for UiAutomation language kind/tag and history rehydration.
Tests/ImageMethodsTests.cs Tests new ImageSource→Bitmap conversion helper.
Tests/CaptureLanguageUtilitiesTests.cs Tests persisted-language matching and UIA live-source requirements.
Files not reviewed (1)
  • Text-Grab/Properties/Settings.Designer.cs: Language not supported
Comments suppressed due to low confidence (1)

Text-Grab/Utilities/OcrUtilities.cs:113

  • GetTextFromAbsoluteRectAsync allocates a System.Drawing.Bitmap via GetRegionOfScreenAsBitmap but never disposes it. Since Bitmap is IDisposable, this can leak GDI handles/memory during repeated captures; wrap it in a using (or using var) so it is disposed after OCR completes.
        Rectangle selectedRegion = rect.AsRectangle();
        Bitmap bmp = ImageMethods.GetRegionOfScreenAsBitmap(selectedRegion);

        return GetStringFromOcrOutputs(await GetTextFromImageAsync(bmp, language));


RectanglesCanvas.Children.Clear();
wordBorders.Clear();
ClearRenderedWordBorders();
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClearRenderedWordBorders(); is indented as if it were inside the if (!hasLoadedImageSource) block, but without braces it will always execute. This is easy to misread and could lead to accidental logic changes later—either fix indentation or add braces to make the intended scope explicit.

Suggested change
ClearRenderedWordBorders();
ClearRenderedWordBorders();

Copilot uses AI. Check for mistakes.
int thisCorrectedTop = (int)absPosPoint.Y + selectedRegion.Top;

Rectangle correctedRegion = new(thisCorrectedLeft, thisCorrectedTop, selectedRegion.Width, selectedRegion.Height);
Bitmap bmp = ImageMethods.GetRegionOfScreenAsBitmap(correctedRegion);
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetRegionsTextAsTableAsync creates Bitmap bmp = ImageMethods.GetRegionOfScreenAsBitmap(...) and then creates a scaled bitmap from it, but the original bmp is never disposed. This can leak GDI handles; consider wrapping bmp in a using once scaledBitmap has been created.

Suggested change
Bitmap bmp = ImageMethods.GetRegionOfScreenAsBitmap(correctedRegion);
using Bitmap bmp = ImageMethods.GetRegionOfScreenAsBitmap(correctedRegion);

Copilot uses AI. Check for mistakes.
Comment on lines 212 to 216
public static async Task<(IOcrLinesWords?, double)> GetOcrResultFromRegionAsync(Rectangle region, ILanguage language)
{
language = GetCompatibleOcrLanguage(language);
Bitmap bmp = ImageMethods.GetRegionOfScreenAsBitmap(region);

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOcrResultFromRegionAsync allocates a Bitmap (bmp = GetRegionOfScreenAsBitmap(region)) and never disposes it (only the scaled bitmap is disposed). This can leak GDI resources when called repeatedly; wrap bmp in a using and dispose it after scaling / OCR is complete.

Copilot uses AI. Check for mistakes.
Comment on lines 1387 to 1391
}

if (ocrResultOfWindow is null)
return;

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DrawOcrRectanglesAsync, if ocrResultOfWindow ends up null the method returns without resetting isDrawing back to false. That will leave the GrabFrame stuck in a permanent "drawing" state and prevent future redraw/search updates. Consider wrapping the body in a try/finally (or ensuring all early returns set isDrawing = false and restart the appropriate timers).

Copilot uses AI. Check for mistakes.
@TheJoeFin
Copy link
Copy Markdown
Owner Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 8, 2026

@TheJoeFin I've opened a new pull request, #631, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits March 8, 2026 01:20
…ents

Co-authored-by: TheJoeFin <7809853+TheJoeFin@users.noreply.github.com>
Disposing selectionBitmap with a using statement caused app crashes.
Now, the bitmap is not disposed immediately, and a comment was
added to highlight the issue and the need for further investigation.
Fix bitmap disposal leaks and isDrawing stuck state from PR review
Refined LanguagePicker to filter out internal OCR engine languages
(UiAutomationLang, WindowsAiLang) and instead use the current
keyboard input language for selection when needed. Updated
imports and clarified parameter naming in GlobalLang. Changed
UiAutomationLang tag and display values for clarity. This ensures
the picker only shows real, user-facing languages and improves
user experience.
@TheJoeFin TheJoeFin merged commit 76ff856 into dev Mar 9, 2026
1 check passed
@TheJoeFin TheJoeFin deleted the ui-automation branch March 9, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request General Processing Relating to the processing of images to some type of text output

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants