Feat/ocr implementation#4
Conversation
WalkthroughAdds a macOS-only OCR feature using an embedded Swift Apple Vision script, a TypeScript OCRService invoking the CLI, Obsidian commands to run OCR, settings/types/defaults and manifest fields, PageProcessor integration to store/inject OCR output, docs and helper scripts, logger.info, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Plugin as Obsidian Plugin
participant Command as OCR Command
participant FS as File System
participant Service as OCRService
participant Swift as Swift CLI
participant Vision as Apple Vision
User->>Plugin: Trigger "Run OCR on current note"
Plugin->>Command: runOCROnCurrentNote(app, settings)
Command->>FS: Read note & resolve transcluded image path
FS-->>Command: Note content / Image path
Command->>Service: performOCR(imagePath, options)
Service->>Service: ensure embedded Swift script is written
Service->>Swift: Execute script with args (image,languages,threshold)
Swift->>Swift: Load/convert/preprocess image
Swift->>Vision: VNRecognizeTextRequest (handwriting)
Vision-->>Swift: Recognized text blocks
Swift->>Service: JSON OCRResult (stdout)
Service-->>Command: Parsed OCRResult
Command->>FS: Write updated note (insert OCR text) and update manifest
FS-->>Command: Write complete
Command->>Plugin: Return summary
Plugin->>User: Show notice
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3aa7355 to
f0372f6
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/services/page-processor.ts (1)
128-176: Don’t mark OCR as “not processed” when OCR ran but found no text.
ocrProcessedis set from!!ocrText. If OCR runs and returns empty/low-confidence text, the page will be reprocessed every import. Track “attempted” separately.🐛 Suggested fix
- let ocrText: string | undefined; - if (this.settings.enableOcr && (isNew || isModified || !manifest.importedPages[pageNum]?.ocrProcessed)) { - ocrText = await this.performOCR(page); - } else if (manifest.importedPages[pageNum]?.ocrText) { - ocrText = manifest.importedPages[pageNum].ocrText; - } + let ocrText: string | undefined; + let ocrAttempted = manifest.importedPages[pageNum]?.ocrProcessed ?? false; + if (this.settings.enableOcr && (isNew || isModified || !ocrAttempted)) { + ocrAttempted = true; + ocrText = await this.performOCR(page); + } else if (manifest.importedPages[pageNum]?.ocrText) { + ocrText = manifest.importedPages[pageNum].ocrText; + } ... - ocrProcessed: !!ocrText, + ocrProcessed: ocrAttempted,
🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 89-101: Add blank lines immediately before and after each Markdown
table to satisfy markdownlint MD058: insert an empty line before the "OCR (macOS
Only)" table and another empty line after it, and do the same for the "Debug"
table; ensure the tables remain directly under their respective headings but
separated by a blank line above and below so the headings and surrounding text
are not adjacent to the table markdown.
In `@src/commands/registry.ts`:
- Around line 85-93: The current call to runOCROnCurrentNote(app, settings)
handles only fulfilled results and can leave rejected promises unhandled; update
the invocation in registry.ts to attach a .catch handler (or convert to
async/await with try/catch) to handle rejections, showing a Notice with the
error message and optionally logging via console.error or the existing logger;
specifically modify the block that calls runOCROnCurrentNote to handle both
rejected promises and fulfilled results (the existing .then branch) so all
failure paths produce a Notice (e.g., "OCR failed: <error>") and prevent
unhandled rejections.
In `@src/commands/run-ocr-command.ts`:
- Around line 120-129: The inserted OCR block (constructed as ocrSection before
creating newContent) lacks the same heading used earlier for re-run detection,
so add the exact heading string "## Extracted Text (OCR)" to ocrSection (e.g.,
prepend that heading plus a newline before result.text) so the code that checks
for the heading (the earlier match for "## Extracted Text (OCR)") will find and
prevent duplicate inserts; update the ocrSection variable (referenced alongside
pageHeadingMatch, insertPosition, result.text, content, newContent) to include
that heading.
In `@tests/utils.test.ts`:
- Around line 201-205: The platform check in the test is wrong because the
expression "!process.platform === 'darwin'" is always false; update the
conditional in the test (the if guarding the non-macOS assertions around
variables available and status in tests/utils.test.ts) to use a direct
inequality like process.platform !== 'darwin' so the non-macOS branch executes
and then assert expect(available).toBe(false) and
expect(status).toContain('macOS') there.
🧹 Nitpick comments (2)
scripts/ocr/ocr.swift (2)
13-17: Resolve SwiftLint duplicate import warning.SwiftLint flags duplicate imports here. If
CoreImage.CIFilterBuiltinsalready re-exports CoreImage types, you can dropimport CoreImage.♻️ Possible cleanup
-import CoreImage import CoreImage.CIFilterBuiltins
154-164: Avoid hardcoding supported OCR languages.The fixed allow-list will silently ignore valid Vision languages on some macOS versions. Prefer querying
supportedRecognitionLanguagesand filtering against that set.♻️ Suggested approach
- let validLanguages = languages.filter { lang in - // Common supported languages - ["en-US", "en-GB", "en", "zh-Hans", "zh-Hant", "zh", "fr-FR", "de-DE", "es-ES", "it-IT", "ja", "ko"].contains(lang) - } + let supported = (try? VNRecognizeTextRequest.supportedRecognitionLanguages( + for: .accurate, + revision: VNRecognizeTextRequest.currentRevision + )) ?? [] + let supportedSet = Set(supported) + let validLanguages = languages.filter { supportedSet.contains($0) }
| ### OCR (macOS Only) | ||
| | Setting | Description | | ||
| |---------|-------------| | ||
| | **Enable OCR** | Extract handwritten text using Apple Vision (macOS desktop; requires Swift/Xcode Command Line Tools) | | ||
| | **OCR Languages** | Comma-separated language codes (e.g., en-US, zh-Hans) | | ||
| | **OCR Confidence Threshold** | Minimum confidence for OCR results (0.0–1.0) | | ||
|
|
||
| ### Debug | ||
| | Setting | Description | | ||
| |---------|-------------| | ||
| | **Enable debug logging** | Enable debug logging to console (for troubleshooting) | | ||
|
|
||
| > **Note:** Auto-sync requires desktop Obsidian. Imported notes can be viewed on any platform (desktop/mobile). | ||
| > **Note:** Auto-sync requires desktop Obsidian. OCR is macOS-only and requires Swift/Xcode Command Line Tools. Imported notes can be viewed on any platform (desktop/mobile). |
There was a problem hiding this comment.
Add blank lines around tables to satisfy markdownlint (MD058).
This avoids lint failures and improves readability.
💚 Suggested fix
### OCR (macOS Only)
+
| Setting | Description |
|---------|-------------|
| **Enable OCR** | Extract handwritten text using Apple Vision (macOS desktop; requires Swift/Xcode Command Line Tools) |
| **OCR Languages** | Comma-separated language codes (e.g., en-US, zh-Hans) |
| **OCR Confidence Threshold** | Minimum confidence for OCR results (0.0–1.0) |
+
### Debug
+
| Setting | Description |
|---------|-------------|
| **Enable debug logging** | Enable debug logging to console (for troubleshooting) |
+
> **Note:** Auto-sync requires desktop Obsidian. OCR is macOS-only and requires Swift/Xcode Command Line Tools. Imported notes can be viewed on any platform (desktop/mobile).🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
90-90: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
97-97: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
In `@README.md` around lines 89 - 101, Add blank lines immediately before and
after each Markdown table to satisfy markdownlint MD058: insert an empty line
before the "OCR (macOS Only)" table and another empty line after it, and do the
same for the "Debug" table; ensure the tables remain directly under their
respective headings but separated by a blank line above and below so the
headings and surrounding text are not adjacent to the table markdown.
| runOCROnCurrentNote(app, settings).then(result => { | ||
| if (result.error) { | ||
| new Notice(result.error); | ||
| } else if (result.textExtracted) { | ||
| new Notice('OCR text extracted and added to note'); | ||
| } else { | ||
| new Notice('No text found or already processed'); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Handle OCR command promise rejections.
If runOCROnCurrentNote rejects, this becomes an unhandled rejection. Add a catch with a Notice (and optional logging).
🔧 Proposed fix
- runOCROnCurrentNote(app, settings).then(result => {
+ runOCROnCurrentNote(app, settings).then(result => {
if (result.error) {
new Notice(result.error);
} else if (result.textExtracted) {
new Notice('OCR text extracted and added to note');
} else {
new Notice('No text found or already processed');
}
- });
+ }).catch(err => {
+ console.error('OCR failed:', err);
+ new Notice(`OCR failed: ${err?.message ?? err}`);
+ });🤖 Prompt for AI Agents
In `@src/commands/registry.ts` around lines 85 - 93, The current call to
runOCROnCurrentNote(app, settings) handles only fulfilled results and can leave
rejected promises unhandled; update the invocation in registry.ts to attach a
.catch handler (or convert to async/await with try/catch) to handle rejections,
showing a Notice with the error message and optionally logging via console.error
or the existing logger; specifically modify the block that calls
runOCROnCurrentNote to handle both rejected promises and fulfilled results (the
existing .then branch) so all failure paths produce a Notice (e.g., "OCR failed:
<error>") and prevent unhandled rejections.
| // Insert OCR text into the note - insert after the page heading | ||
| const pageHeadingMatch = content.match(/\n## Page \d+\n/); | ||
| if (!pageHeadingMatch) { | ||
| return { success: true, textExtracted: false, error: 'Could not find page heading to insert OCR text' }; | ||
| } | ||
|
|
||
| const insertPosition = pageHeadingMatch.index! + pageHeadingMatch[0]!.length; | ||
| const ocrSection = `\n${result.text}\n\n---\n\n`; | ||
|
|
||
| const newContent = content.slice(0, insertPosition) + ocrSection + content.slice(insertPosition); |
There was a problem hiding this comment.
Insert the same OCR heading you use for re-run detection.
You check for ## Extracted Text (OCR) earlier, but the inserted section doesn’t include that heading. Re-running will duplicate OCR text indefinitely.
🐛 Proposed fix
- const ocrSection = `\n${result.text}\n\n---\n\n`;
+ const ocrSection = `\n## Extracted Text (OCR)\n\n${result.text}\n\n---\n\n`;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Insert OCR text into the note - insert after the page heading | |
| const pageHeadingMatch = content.match(/\n## Page \d+\n/); | |
| if (!pageHeadingMatch) { | |
| return { success: true, textExtracted: false, error: 'Could not find page heading to insert OCR text' }; | |
| } | |
| const insertPosition = pageHeadingMatch.index! + pageHeadingMatch[0]!.length; | |
| const ocrSection = `\n${result.text}\n\n---\n\n`; | |
| const newContent = content.slice(0, insertPosition) + ocrSection + content.slice(insertPosition); | |
| // Insert OCR text into the note - insert after the page heading | |
| const pageHeadingMatch = content.match(/\n## Page \d+\n/); | |
| if (!pageHeadingMatch) { | |
| return { success: true, textExtracted: false, error: 'Could not find page heading to insert OCR text' }; | |
| } | |
| const insertPosition = pageHeadingMatch.index! + pageHeadingMatch[0]!.length; | |
| const ocrSection = `\n## Extracted Text (OCR)\n\n${result.text}\n\n---\n\n`; | |
| const newContent = content.slice(0, insertPosition) + ocrSection + content.slice(insertPosition); |
🤖 Prompt for AI Agents
In `@src/commands/run-ocr-command.ts` around lines 120 - 129, The inserted OCR
block (constructed as ocrSection before creating newContent) lacks the same
heading used earlier for re-run detection, so add the exact heading string "##
Extracted Text (OCR)" to ocrSection (e.g., prepend that heading plus a newline
before result.text) so the code that checks for the heading (the earlier match
for "## Extracted Text (OCR)") will find and prevent duplicate inserts; update
the ocrSection variable (referenced alongside pageHeadingMatch, insertPosition,
result.text, content, newContent) to include that heading.
| // In non-macOS environments, should not be available | ||
| if (!process.platform === 'darwin') { | ||
| expect(available).toBe(false); | ||
| expect(status).toContain('macOS'); | ||
| } |
There was a problem hiding this comment.
Fix platform check so the non‑macOS assertion runs.
!process.platform === 'darwin' is always false; use a direct inequality instead.
🔧 Proposed fix
- if (!process.platform === 'darwin') {
+ if (process.platform !== 'darwin') {
expect(available).toBe(false);
expect(status).toContain('macOS');
}🤖 Prompt for AI Agents
In `@tests/utils.test.ts` around lines 201 - 205, The platform check in the test
is wrong because the expression "!process.platform === 'darwin'" is always
false; update the conditional in the test (the if guarding the non-macOS
assertions around variables available and status in tests/utils.test.ts) to use
a direct inequality like process.platform !== 'darwin' so the non-macOS branch
executes and then assert expect(available).toBe(false) and
expect(status).toContain('macOS') there.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/commands/registry.ts`:
- Around line 94-105: Wrap the async callback passed to plugin.addCommand (the
'run-ocr-all-notes' callback) in a try-catch: call await runOCROnAllNotes(app,
settings) inside try, build and show the Notice only if runOCROnAllNotes does
not already display it (remove the duplicate Notice if runOCROnAllNotes already
notifies), and in catch log the unexpected error (console.error) and show a user
Notice about the failure; also ensure any references to app.vault operations are
treated as possibly throwing so they are covered by this new try-catch.
In `@src/services/ocr-service.ts`:
- Around line 151-161: The current code builds a shell command string named
command by interpolating imagePath, languagesStr, this.scriptPath and
confidenceThreshold and calls execAsync, which is vulnerable to command
injection; replace this with a safe execFile/spawn call that passes
this.scriptPath as the executable and an explicit args array [imagePath,
languagesStr, String(confidenceThreshold)] to avoid shell interpretation, keep
the same timeout and maxBuffer options, and update log.debug to log the
executable and args rather than the interpolated command; modify the call site
that uses execAsync (and the command variable) to use the execFile/spawn API
instead while preserving behavior and error handling.
In `@src/services/page-processor.ts`:
- Around line 324-329: The Table of Contents currently appends links like
[[`#Page` 2]] which point to headings that don't exist in the current file; update
the ToC generation around pageContent/bookResult.pages so links target the
separate page files and their anchors (e.g., use the page file name + fragment:
[[Page ${String(p.pageNum).padStart(3,'0')}#Page ${p.pageNum}]] or the actual
filename property if pages expose one), ensuring the anchor matches the heading
text "## Page {p.pageNum}"; alternatively, if you intended the ToC to only
contain the current page, change the loop to iterate only over the current page
instead of bookResult.pages.
- Around line 175-176: The current assignment ocrProcessed: !!ocrText treats
pages with empty OCR results as not processed, causing repeated re-processing;
change the logic in src/services/page-processor.ts so ocrProcessed reflects that
OCR was attempted rather than whether text was returned — e.g., introduce or use
an "ocrAttempted" / "ocrRan" boolean set when OCR runs and set ocrProcessed:
ocrAttempted (instead of !!ocrText), leaving ocrText as-is; this ensures the
manifest.importedPages[pageNum]?.ocrProcessed check only skips pages that truly
haven't had OCR attempted.
♻️ Duplicate comments (2)
tests/utils.test.ts (1)
201-205: Fix platform check so the non-macOS assertion runs.The expression
!process.platform === 'darwin'is incorrect—!process.platformevaluates to a boolean, which is then compared to the string'darwin', always resulting infalse. Use!==for the intended comparison.🔧 Proposed fix
- if (!process.platform === 'darwin') { + if (process.platform !== 'darwin') { expect(available).toBe(false); expect(status).toContain('macOS'); }src/commands/registry.ts (1)
78-86: Handle OCR command promise rejections.If
runOCROnCurrentNoterejects, this becomes an unhandled rejection. Add a.catch()handler with a Notice and optional logging.🔧 Proposed fix
runOCROnCurrentNote(app, settings).then(result => { if (result.error) { new Notice(result.error); } else if (result.textExtracted) { new Notice('OCR text extracted and added to note'); } else { new Notice('No text found or already processed'); } - }); + }).catch(err => { + console.error('OCR failed:', err); + new Notice(`OCR failed: ${err?.message ?? err}`); + });
🧹 Nitpick comments (4)
tests/utils.test.ts (1)
3-3: Remove unusedbeforeEachimport.The
beforeEachimport is not used anywhere in this test file.🧹 Proposed fix
-import { describe, it, expect, vi, beforeEach } from 'vitest'; +import { describe, it, expect, vi } from 'vitest';scripts/ocr/ocr.swift (1)
1-260: Consider automating synchronization between standalone and embedded scripts.This script duplicates the content embedded in
src/services/ocr-swift-embed.ts. While having a standalone version is useful for development and testing, manual synchronization creates a maintenance risk where the two versions can diverge.Consider either:
- Adding a build script that generates
ocr-swift-embed.tsfrom this file- Adding a test that verifies the embedded content matches this file
#!/bin/bash # Verify scripts are in sync (excluding minor formatting differences) # Compare core logic between standalone and embedded versions # Extract the embedded script content (strip the TypeScript wrapper) rg -A 1000 'export const OCR_SWIFT_SCRIPT' src/services/ocr-swift-embed.ts | head -n -1 | tail -n +2 > /tmp/embedded.swift # Compare function signatures present in both echo "=== Functions in standalone script ===" rg -n '^func ' scripts/ocr/ocr.swift echo -e "\n=== Functions in embedded script (raw check) ===" rg 'func ' src/services/ocr-swift-embed.ts | head -10src/services/ocr-service.ts (2)
68-92: Script temp directory is never cleaned up.The temp directory created at line 74 persists across plugin sessions. While each session creates a new temp directory, old ones are never removed, causing gradual disk usage accumulation.
Consider:
- Adding a
cleanup()method to remove the temp directory on plugin unload- Using a deterministic path (e.g.,
join(tmpdir(), 'viwoods-ocr')) and overwriting if it existsAdd cleanup method
/** * Clean up temporary OCR script */ public async cleanup(): Promise<void> { if (this.scriptPath) { try { const { rm } = require('fs').promises; const { dirname } = require('path'); await rm(dirname(this.scriptPath), { recursive: true, force: true }); this.scriptPath = null; this.scriptInitialized = false; log.debug('OCR script cleaned up'); } catch (error) { log.warn('Failed to cleanup OCR script:', error); } } }Call this from the plugin's
onunload()method.
230-247: Consider usingwriteFileinstead of stream for simplicity.The blob is already fully buffered in memory, so using a write stream adds complexity without benefit. The promises-based
writeFileis simpler:Simplified blob write
- const { createWriteStream } = require('fs'); - const { unlink } = require('fs').promises; + const { writeFile, unlink } = require('fs').promises; const { tmpdir } = require('os'); const { join } = require('path'); const tempPath = join(tmpdir(), `viwoods-ocr-${Date.now()}.png`); try { // Write blob to temp file const buffer = Buffer.from(await blob.arrayBuffer()); - await new Promise<void>((resolve, reject) => { - const stream = createWriteStream(tempPath); - stream.write(buffer); - stream.end(); - stream.on('finish', resolve); - stream.on('error', reject); - }); + await writeFile(tempPath, buffer); // Perform OCR
| plugin.addCommand({ | ||
| id: 'run-ocr-all-notes', | ||
| name: 'Run OCR on all Viwoods notes', | ||
| callback: async () => { | ||
| const summary = await runOCROnAllNotes(app, settings); | ||
| const message = `OCR completed: ${summary.processedNotes}/${summary.totalNotes} notes processed`; | ||
| new Notice(message); | ||
| if (summary.errors.length > 0) { | ||
| console.log('OCR errors:', summary.errors); | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
Wrap async callback in try-catch for robustness.
While runOCROnAllNotes handles errors internally via summary.errors, an unexpected rejection (e.g., from app.vault operations) would be unhandled. Also, the summary message duplicates the Notice already shown inside runOCROnAllNotes (see relevant snippet lines 207-211).
🔧 Proposed fix
plugin.addCommand({
id: 'run-ocr-all-notes',
name: 'Run OCR on all Viwoods notes',
callback: async () => {
- const summary = await runOCROnAllNotes(app, settings);
- const message = `OCR completed: ${summary.processedNotes}/${summary.totalNotes} notes processed`;
- new Notice(message);
- if (summary.errors.length > 0) {
- console.log('OCR errors:', summary.errors);
+ try {
+ await runOCROnAllNotes(app, settings);
+ // Notice already shown inside runOCROnAllNotes
+ } catch (err) {
+ console.error('OCR batch failed:', err);
+ new Notice(`OCR failed: ${err instanceof Error ? err.message : String(err)}`);
}
}
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| plugin.addCommand({ | |
| id: 'run-ocr-all-notes', | |
| name: 'Run OCR on all Viwoods notes', | |
| callback: async () => { | |
| const summary = await runOCROnAllNotes(app, settings); | |
| const message = `OCR completed: ${summary.processedNotes}/${summary.totalNotes} notes processed`; | |
| new Notice(message); | |
| if (summary.errors.length > 0) { | |
| console.log('OCR errors:', summary.errors); | |
| } | |
| } | |
| }); | |
| plugin.addCommand({ | |
| id: 'run-ocr-all-notes', | |
| name: 'Run OCR on all Viwoods notes', | |
| callback: async () => { | |
| try { | |
| await runOCROnAllNotes(app, settings); | |
| // Notice already shown inside runOCROnAllNotes | |
| } catch (err) { | |
| console.error('OCR batch failed:', err); | |
| new Notice(`OCR failed: ${err instanceof Error ? err.message : String(err)}`); | |
| } | |
| } | |
| }); |
🤖 Prompt for AI Agents
In `@src/commands/registry.ts` around lines 94 - 105, Wrap the async callback
passed to plugin.addCommand (the 'run-ocr-all-notes' callback) in a try-catch:
call await runOCROnAllNotes(app, settings) inside try, build and show the Notice
only if runOCROnAllNotes does not already display it (remove the duplicate
Notice if runOCROnAllNotes already notifies), and in catch log the unexpected
error (console.error) and show a user Notice about the failure; also ensure any
references to app.vault operations are treated as possibly throwing so they are
covered by this new try-catch.
| // Build command - use swift directly to run the script | ||
| const languagesStr = languages.join(','); | ||
| const command = `swift "${this.scriptPath}" "${imagePath}" "${languagesStr}" ${confidenceThreshold}`; | ||
|
|
||
| log.debug('Running OCR command:', command); | ||
|
|
||
| try { | ||
| const { stdout, stderr } = await execAsync(command, { | ||
| timeout, | ||
| maxBuffer: 10 * 1024 * 1024 // 10MB buffer for large OCR results | ||
| }); |
There was a problem hiding this comment.
Potential command injection via unescaped shell arguments.
The command string interpolates imagePath and languagesStr directly. If these contain shell metacharacters (e.g., "; rm -rf /;), they could execute arbitrary commands. While imagePath is typically a temp file path controlled by the service, languagesStr comes from user settings.
Consider using execFile (or spawn) with an arguments array instead of exec with a shell command string:
Proposed fix using execFile
-import { exec } from 'child_process';
+import { execFile } from 'child_process';
import { promisify } from 'util';
-const execAsync = promisify(exec);
+const execFileAsync = promisify(execFile);
// In performOCR method:
- const command = `swift "${this.scriptPath}" "${imagePath}" "${languagesStr}" ${confidenceThreshold}`;
-
- log.debug('Running OCR command:', command);
+ const args = [this.scriptPath, imagePath, languagesStr, String(confidenceThreshold)];
+ log.debug('Running OCR command: swift', args);
try {
- const { stdout, stderr } = await execAsync(command, {
+ const { stdout, stderr } = await execFileAsync('swift', args, {
timeout,
maxBuffer: 10 * 1024 * 1024 // 10MB buffer for large OCR results
});🤖 Prompt for AI Agents
In `@src/services/ocr-service.ts` around lines 151 - 161, The current code builds
a shell command string named command by interpolating imagePath, languagesStr,
this.scriptPath and confidenceThreshold and calls execAsync, which is vulnerable
to command injection; replace this with a safe execFile/spawn call that passes
this.scriptPath as the executable and an explicit args array [imagePath,
languagesStr, String(confidenceThreshold)] to avoid shell interpretation, keep
the same timeout and maxBuffer options, and update log.debug to log the
executable and args rather than the interpolated command; modify the call site
that uses execAsync (and the command variable) to use the execFile/spawn API
instead while preserving behavior and error handling.
| ocrProcessed: !!ocrText, | ||
| ocrText: ocrText, |
There was a problem hiding this comment.
OCR state tracking may cause repeated processing on text-free pages.
Setting ocrProcessed: !!ocrText means pages where OCR ran successfully but found no text will have ocrProcessed: false. This triggers re-processing on every import since the condition at line 130 checks !manifest.importedPages[pageNum]?.ocrProcessed.
Consider tracking that OCR was attempted, regardless of whether text was found:
Proposed fix
manifest.importedPages[pageNum] = {
fileName: pageFileName,
importDate: new Date().toISOString(),
imageHash: page.image.hash,
geminiProcessed: manifest.importedPages[pageNum]?.geminiProcessed || false,
- ocrProcessed: !!ocrText,
+ ocrProcessed: this.settings.enableOcr && (isNew || isModified || !manifest.importedPages[pageNum]?.ocrProcessed),
ocrText: ocrText,Or more simply, track that OCR was attempted this import cycle when OCR is enabled:
- ocrProcessed: !!ocrText,
+ ocrProcessed: this.settings.enableOcr ? true : (manifest.importedPages[pageNum]?.ocrProcessed || false),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ocrProcessed: !!ocrText, | |
| ocrText: ocrText, | |
| manifest.importedPages[pageNum] = { | |
| fileName: pageFileName, | |
| importDate: new Date().toISOString(), | |
| imageHash: page.image.hash, | |
| geminiProcessed: manifest.importedPages[pageNum]?.geminiProcessed || false, | |
| ocrProcessed: this.settings.enableOcr ? true : (manifest.importedPages[pageNum]?.ocrProcessed || false), | |
| ocrText: ocrText, |
🤖 Prompt for AI Agents
In `@src/services/page-processor.ts` around lines 175 - 176, The current
assignment ocrProcessed: !!ocrText treats pages with empty OCR results as not
processed, causing repeated re-processing; change the logic in
src/services/page-processor.ts so ocrProcessed reflects that OCR was attempted
rather than whether text was returned — e.g., introduce or use an "ocrAttempted"
/ "ocrRan" boolean set when OCR runs and set ocrProcessed: ocrAttempted (instead
of !!ocrText), leaving ocrText as-is; this ensures the
manifest.importedPages[pageNum]?.ocrProcessed check only skips pages that truly
haven't had OCR attempted.
| // 2. Table of Contents | ||
| pageContent += '# Table of Contents\n\n'; | ||
| for (const p of bookResult.pages) { | ||
| pageContent += `- [[#Page ${p.pageNum}]]\n`; | ||
| } | ||
| pageContent += '\n---\n\n'; |
There was a problem hiding this comment.
Table of Contents links to non-existent headings.
Each page file generates a ToC with links to all pages ([[#Page 1]], [[#Page 2]], etc.), but the file only contains one heading: ## Page {pageNum} for the current page. The other [[#Page N]] links will be broken.
If the intent is to link between page files, use [[Page 001#Page 1]] format. If the ToC should only show the current page, iterate only over [page] instead of bookResult.pages.
Proposed fix - link to separate page files
// 2. Table of Contents
pageContent += '# Table of Contents\n\n';
for (const p of bookResult.pages) {
- pageContent += `- [[`#Page` ${p.pageNum}]]\n`;
+ const pageFileName = `Page ${String(p.pageNum).padStart(3, '0')}`;
+ pageContent += `- [[${pageFileName}|Page ${p.pageNum}]]\n`;
}
pageContent += '\n---\n\n';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // 2. Table of Contents | |
| pageContent += '# Table of Contents\n\n'; | |
| for (const p of bookResult.pages) { | |
| pageContent += `- [[#Page ${p.pageNum}]]\n`; | |
| } | |
| pageContent += '\n---\n\n'; | |
| // 2. Table of Contents | |
| pageContent += '# Table of Contents\n\n'; | |
| for (const p of bookResult.pages) { | |
| const pageFileName = `Page ${String(p.pageNum).padStart(3, '0')}`; | |
| pageContent += `- [[${pageFileName}|Page ${p.pageNum}]]\n`; | |
| } | |
| pageContent += '\n---\n\n'; |
🤖 Prompt for AI Agents
In `@src/services/page-processor.ts` around lines 324 - 329, The Table of Contents
currently appends links like [[`#Page` 2]] which point to headings that don't
exist in the current file; update the ToC generation around
pageContent/bookResult.pages so links target the separate page files and their
anchors (e.g., use the page file name + fragment: [[Page
${String(p.pageNum).padStart(3,'0')}#Page ${p.pageNum}]] or the actual filename
property if pages expose one), ensuring the anchor matches the heading text "##
Page {p.pageNum}"; alternatively, if you intended the ToC to only contain the
current page, change the loop to iterate only over the current page instead of
bookResult.pages.
Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com>
Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com>
f0372f6 to
e404660
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 181-184: Add a blank line after the "OCR (macOS Only)" section
content so the "## Performance Guidelines" heading is separated by an empty
line; locate the "OCR (macOS Only)" heading and the following "## Performance
Guidelines" heading in docs/development.md and insert a single empty line
between them to ensure proper Markdown rendering.
In `@scripts/ocr/ocr.swift`:
- Around line 16-17: Remove the redundant import by deleting the plain import
CoreImage line and keep only import CoreImage.CIFilterBuiltins (since
CIFilterBuiltins already exposes CoreImage); update ocr.swift to reference just
CoreImage.CIFilterBuiltins to resolve the SwiftLint duplicate-import warning.
In `@src/commands/run-ocr-command.ts`:
- Around line 174-178: The second part of the condition is redundant because
notesFolder is assigned from
app.vault.getAbstractFileByPath(settings.notesFolder); replace the equality
check with a type check to ensure the path resolves to a folder (e.g., verify
notesFolder is an instance of TFolder), so in the run-ocr flow where notesFolder
is used (variable notesFolder, settings.notesFolder, and summary.errors.push),
change the condition to check that notesFolder exists and is a TFolder and push
the error if not.
In `@src/settings.ts`:
- Around line 181-220: UI labels are using title case; update them to sentence
case (first word capitalized, keep acronyms like OCR uppercase) to satisfy lint
rules: change containerEl.createEl('h3', { text: 'OCR Settings' }) to 'OCR
settings', and update each Setting().setName(...) from 'Enable OCR', 'OCR
Languages', 'OCR Confidence Threshold' to sentence case variants ('Enable OCR'
is already fine as first word capitalized but confirm others become 'OCR
languages' and 'OCR confidence threshold'), and ensure .setDesc(...) strings
follow sentence case where applicable; locate these texts in the block that uses
containerEl.createEl('h3', ...) and the Setting(...) chains to apply the edits
and then run the linter.
♻️ Duplicate comments (9)
README.md (1)
89-101: Add blank lines around tables to satisfy markdownlint (MD058).This issue was previously flagged. Tables at lines 90 and 97 need blank lines before and after them to comply with markdownlint MD058.
tests/utils.test.ts (1)
201-206: Fix operator precedence bug in platform check.The expression
!process.platform === 'darwin'is incorrect due to operator precedence.!process.platformevaluates tofalse(sinceprocess.platformis a truthy string), thenfalse === 'darwin'is alwaysfalse. This means the non-macOS assertions never run.🔧 Proposed fix
// In non-macOS environments, should not be available - if (!process.platform === 'darwin') { + if (process.platform !== 'darwin') { expect(available).toBe(false); expect(status).toContain('macOS'); }src/commands/registry.ts (3)
95-106: Wrap async callback in try-catch and remove duplicate Notice.Two issues:
- The async callback should have a try-catch for unexpected errors
runOCROnAllNotesalready shows a completion Notice internally (seesrc/commands/run-ocr-command.tslines 207-211), so lines 100-101 create a duplicate notification🔧 Proposed fix
plugin.addCommand({ id: 'run-ocr-all-notes', name: 'Run OCR on all Viwoods notes', callback: async () => { - const summary = await runOCROnAllNotes(app, settings); - const message = `OCR completed: ${summary.processedNotes}/${summary.totalNotes} notes processed`; - new Notice(message); - if (summary.errors.length > 0) { - console.log('OCR errors:', summary.errors); + try { + await runOCROnAllNotes(app, settings); + // Notice already shown inside runOCROnAllNotes + } catch (err) { + console.error('OCR batch failed:', err); + new Notice(`OCR failed: ${err instanceof Error ? err.message : String(err)}`); } } });
79-87: Handle promise rejection to prevent unhandled rejection.The
runOCROnCurrentNotecall uses.then()but lacks a.catch()handler. If the promise rejects unexpectedly, it will cause an unhandled rejection.🔧 Proposed fix
runOCROnCurrentNote(app, settings).then(result => { if (result.error) { new Notice(result.error); } else if (result.textExtracted) { new Notice('OCR text extracted and added to note'); } else { new Notice('No text found or already processed'); } - }); + }).catch(err => { + console.error('OCR failed:', err); + new Notice(`OCR failed: ${err instanceof Error ? err.message : String(err)}`); + });
74-74: Add "OCR" to the ESLint brands list and add promise error handling.The sentence-case lint failure is because "OCR" is missing from the
brandslist ineslint.config.mjs. Add it there to resolve the CI failure. Additionally, the promise in the "Run OCR on current note" command (lines 79-87) lacks a.catch()handler for rejection, which can cause silent failures.📝 Suggested fixes
1. Update eslint.config.mjs to include "OCR" in brands:
"brands": ["Viwoods", "JSZip", "jsPDF", "Windows", "Linux", "Obsidian", "macOS", "Gemini"], + "OCR"],2. Add
.catch()handler to the promise chain in registry.ts:runOCROnCurrentNote(app, settings).then(result => { if (result.error) { new Notice(result.error); } else if (result.textExtracted) { new Notice('OCR text extracted and added to note'); } else { new Notice('No text found or already processed'); } - }); + }).catch(error => { + new Notice(`OCR failed: ${error instanceof Error ? error.message : String(error)}`); + });3. Optional: Remove duplicate Notice from registry.ts lines 99-101 since
runOCROnAllNotesalready creates appropriate Notices internally (lines 195, 210-211 in run-ocr-command.ts).src/commands/run-ocr-command.ts (1)
120-129: Insert the same OCR heading you use for re-run detection.Line 49 checks for
## Extracted Text (OCR)but line 127 doesn't include that heading in the inserted section. Re-running OCR will duplicate text indefinitely.🐛 Proposed fix
- const ocrSection = `\n${result.text}\n\n---\n\n`; + const ocrSection = `\n## Extracted Text (OCR)\n\n${result.text}\n\n---\n\n`;src/services/page-processor.ts (2)
174-175: OCR state tracking may cause repeated processing on text-free pages.Setting
ocrProcessed: !!ocrTextmeans pages where OCR ran successfully but found no text will haveocrProcessed: false, triggering re-processing on every import.Proposed fix
manifest.importedPages[pageNum] = { fileName: pageFileName, importDate: new Date().toISOString(), imageHash: page.image.hash, geminiProcessed: manifest.importedPages[pageNum]?.geminiProcessed || false, - ocrProcessed: !!ocrText, + ocrProcessed: this.settings.enableOcr ? true : (manifest.importedPages[pageNum]?.ocrProcessed || false), ocrText: ocrText,
323-328: Table of Contents links to non-existent headings.Each page file generates a ToC with links to all pages (
[[#Page1]],[[#Page2]], etc.), but the file only contains one heading:## Page {pageNum}for the current page. The other[[#PageN]]links will be broken.Proposed fix - link to separate page files
// 2. Table of Contents pageContent += '# Table of Contents\n\n'; for (const p of bookResult.pages) { - pageContent += `- [[`#Page` ${p.pageNum}]]\n`; + const pageFileName = `Page ${String(p.pageNum).padStart(3, '0')}`; + pageContent += `- [[${pageFileName}|Page ${p.pageNum}]]\n`; } pageContent += '\n---\n\n';src/services/ocr-service.ts (1)
151-161: Potential command injection via unescaped shell arguments.The command string interpolates
imagePathandlanguagesStrdirectly. If these contain shell metacharacters (e.g.,"; rm -rf /;), arbitrary commands could execute. WhileimagePathis typically a temp file,languagesStrcomes from user settings.Proposed fix using execFile
-import { exec } from 'child_process'; +import { execFile } from 'child_process'; import { promisify } from 'util'; -const execAsync = promisify(exec); +const execFileAsync = promisify(execFile); // In performOCR method: - const languagesStr = languages.join(','); - const command = `swift "${this.scriptPath}" "${imagePath}" "${languagesStr}" ${confidenceThreshold}`; - - log.debug('Running OCR command:', command); + const languagesStr = languages.join(','); + const args = [this.scriptPath, imagePath, languagesStr, String(confidenceThreshold)]; + log.debug('Running OCR command: swift', args); try { - const { stdout, stderr } = await execAsync(command, { + const { stdout, stderr } = await execFileAsync('swift', args, { timeout, maxBuffer: 10 * 1024 * 1024 // 10MB buffer for large OCR results });
🧹 Nitpick comments (4)
scripts/ocr/ocr.swift (1)
155-164: Consider expanding or removing the hardcoded language filter.The hardcoded list of valid languages may reject other Apple Vision-supported languages (e.g.,
pt-BR,ru-RU,nl-NL). Consider either:
- Removing the filter and letting Vision handle invalid languages
- Expanding the list based on Vision's supported languages
- Adding a warning when languages are filtered out
💡 Alternative approach
// Set recognition languages - let validLanguages = languages.filter { lang in - // Common supported languages - ["en-US", "en-GB", "en", "zh-Hans", "zh-Hant", "zh", "fr-FR", "de-DE", "es-ES", "it-IT", "ja", "ko"].contains(lang) - } - - if validLanguages.isEmpty { - request.recognitionLanguages = ["en-US"] - } else { - request.recognitionLanguages = validLanguages - } + // Let Vision handle invalid language codes; fall back to en-US if empty + if languages.isEmpty { + request.recognitionLanguages = ["en-US"] + } else { + request.recognitionLanguages = languages + }tests/utils.test.ts (1)
3-3: Remove unused importbeforeEach.
beforeEachis imported but never used in the test file.📝 Suggested fix
-import { describe, it, expect, vi, beforeEach } from 'vitest'; +import { describe, it, expect, vi } from 'vitest';src/commands/run-ocr-command.ts (1)
197-209: Counter semantics are misleading.
totalImagesis incremented for every note processed, but the name suggests it tracks images. Since each note may contain zero or one image, consider renaming tototalNotesProcessedor tracking actual image count separately.src/services/ocr-service.ts (1)
236-247: Consider using crypto.randomUUID for temp file uniqueness.
Date.now()may collide under rapid concurrent calls. This is low risk but could be improved:Optional improvement
- const tempPath = join(tmpdir(), `viwoods-ocr-${Date.now()}.png`); + const tempPath = join(tmpdir(), `viwoods-ocr-${Date.now()}-${Math.random().toString(36).slice(2)}.png`);
| ## OCR (macOS Only) | ||
|
|
||
| OCR runs locally using Apple Vision via an embedded Swift script. It requires macOS desktop and the Swift toolchain (Xcode Command Line Tools). | ||
| ## Performance Guidelines |
There was a problem hiding this comment.
Add a blank line before the next section heading.
There's a missing blank line between the OCR section content (line 183) and the "## Performance Guidelines" heading (line 184). This could cause rendering issues in some markdown parsers.
📝 Suggested fix
## OCR (macOS Only)
OCR runs locally using Apple Vision via an embedded Swift script. It requires macOS desktop and the Swift toolchain (Xcode Command Line Tools).
+
## Performance Guidelines📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## OCR (macOS Only) | |
| OCR runs locally using Apple Vision via an embedded Swift script. It requires macOS desktop and the Swift toolchain (Xcode Command Line Tools). | |
| ## Performance Guidelines | |
| ## OCR (macOS Only) | |
| OCR runs locally using Apple Vision via an embedded Swift script. It requires macOS desktop and the Swift toolchain (Xcode Command Line Tools). | |
| ## Performance Guidelines |
🤖 Prompt for AI Agents
In `@docs/development.md` around lines 181 - 184, Add a blank line after the "OCR
(macOS Only)" section content so the "## Performance Guidelines" heading is
separated by an empty line; locate the "OCR (macOS Only)" heading and the
following "## Performance Guidelines" heading in docs/development.md and insert
a single empty line between them to ensure proper Markdown rendering.
| import CoreImage | ||
| import CoreImage.CIFilterBuiltins |
There was a problem hiding this comment.
Remove duplicate import.
CoreImage.CIFilterBuiltins already includes CoreImage, making line 16 redundant. This triggers a SwiftLint warning.
📝 Suggested fix
import AppKit
-import CoreImage
import CoreImage.CIFilterBuiltins📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import CoreImage | |
| import CoreImage.CIFilterBuiltins | |
| import CoreImage.CIFilterBuiltins |
🧰 Tools
🪛 SwiftLint (0.57.0)
[Warning] 17-17: Imports should be unique
(duplicate_imports)
🤖 Prompt for AI Agents
In `@scripts/ocr/ocr.swift` around lines 16 - 17, Remove the redundant import by
deleting the plain import CoreImage line and keep only import
CoreImage.CIFilterBuiltins (since CIFilterBuiltins already exposes CoreImage);
update ocr.swift to reference just CoreImage.CIFilterBuiltins to resolve the
SwiftLint duplicate-import warning.
| const notesFolder = app.vault.getAbstractFileByPath(settings.notesFolder); | ||
| if (!notesFolder || notesFolder !== app.vault.getAbstractFileByPath(settings.notesFolder)) { | ||
| summary.errors.push(`Notes folder not found: ${settings.notesFolder}`); | ||
| return summary; | ||
| } |
There was a problem hiding this comment.
Redundant condition will never evaluate to true.
notesFolder is assigned from app.vault.getAbstractFileByPath(settings.notesFolder) on line 174, so the !== comparison on line 175 is always false. The intended check is likely whether notesFolder is a folder:
🐛 Proposed fix
const notesFolder = app.vault.getAbstractFileByPath(settings.notesFolder);
- if (!notesFolder || notesFolder !== app.vault.getAbstractFileByPath(settings.notesFolder)) {
+ if (!notesFolder || !(notesFolder instanceof TFolder)) {
summary.errors.push(`Notes folder not found: ${settings.notesFolder}`);
return summary;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const notesFolder = app.vault.getAbstractFileByPath(settings.notesFolder); | |
| if (!notesFolder || notesFolder !== app.vault.getAbstractFileByPath(settings.notesFolder)) { | |
| summary.errors.push(`Notes folder not found: ${settings.notesFolder}`); | |
| return summary; | |
| } | |
| const notesFolder = app.vault.getAbstractFileByPath(settings.notesFolder); | |
| if (!notesFolder || !(notesFolder instanceof TFolder)) { | |
| summary.errors.push(`Notes folder not found: ${settings.notesFolder}`); | |
| return summary; | |
| } |
🤖 Prompt for AI Agents
In `@src/commands/run-ocr-command.ts` around lines 174 - 178, The second part of
the condition is redundant because notesFolder is assigned from
app.vault.getAbstractFileByPath(settings.notesFolder); replace the equality
check with a type check to ensure the path resolves to a folder (e.g., verify
notesFolder is an instance of TFolder), so in the run-ocr flow where notesFolder
is used (variable notesFolder, settings.notesFolder, and summary.errors.push),
change the condition to check that notesFolder exists and is a TFolder and push
the error if not.
…ed import Signed-off-by: Tommy Nguyen <tuannvm@hotmail.com>
Summary
Changes
Testing
Related Issues
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.