Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Wolfram Mathematica temporary files
*.nb.bak
*.nb~
*.tmp

# Image captures and outputs
hoj*.pdf
*.png
*.jpg
*.jpeg
*.tiff

# Analysis reports
*_analysis_report.txt
comprehensive_book_analysis.txt

# Test files
test_page.*

# System files
.DS_Store
Thumbs.db
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,30 @@
# wolfram-mathematica-codigo
Web Scraping Internet Archive Books Using Mathematica

## New Features: Image Description and Analysis

This notebook now includes advanced image analysis capabilities for the scraped book pages:

### Image Description Functions

- **`describeImage[imagePath]`** - Analyzes a single image and provides:
- OCR text extraction
- Image dimensions and properties
- Word and line count statistics
- Comprehensive description report

- **`batchDescribeImages[directory]`** - Processes all images in a directory and generates a summary report

- **`funcWithDescription[lista]`** - Enhanced version of the original capture function that automatically analyzes each captured page

### Features Added

1. **Optical Character Recognition (OCR)** - Extracts text content from book pages
2. **Image Analysis** - Provides technical details about images (dimensions, color space, etc.)
3. **Content Statistics** - Counts words, lines, and other text metrics
4. **Batch Processing** - Analyzes multiple images automatically
5. **Report Generation** - Creates comprehensive analysis reports in text format

### Usage

The enhanced workflow now captures book pages AND automatically describes their content, making it easier to search and catalog the extracted information.
Loading