Multi-13: Word Search Path Data Generator

Generates synthetic datasets for word-search letter-grid tracing. The agent must locate a hidden word inside a grid of letters and trace its path — testing pattern matching across multiple letters in straight or non-straight directions.

Each sample pairs a task (first frame + prompt describing what needs to happen) with its ground truth solution (final frame showing the result + video demonstrating how to achieve it). This structure enables both model evaluation and training.

📌 Basic Information

Property	Value
Task ID	Multi-13
Task	Word Search Path Trace
Category	Constraint Satisfaction Puzzles
Resolution	1024×1024 px
FPS	16 fps
Duration	varies
Output	PNG images + MP4 video

🚀 Usage

Installation

# 1. Clone the repository
git clone https://github.com/VBVR-DataFactory/Multi-13_wordsearch_path_data-generator.git
cd Multi-13_wordsearch_path_data-generator

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Generate Data

# Generate 50 samples
python examples/generate.py --num-samples 50

# Reproducible generation with seed
python examples/generate.py --num-samples 50 --seed 42

# Custom output directory
python examples/generate.py --num-samples 100 --output data/my_dataset

# Without videos (faster, images only)
python examples/generate.py --num-samples 50 --no-videos

Command-Line Options

Argument	Description
`--num-samples`	Number of tasks to generate (required)
`--output`	Output directory (default: data/questions)
`--seed`	Random seed for reproducibility
`--no-videos`	Skip video generation (images only)

📖 Task Example

Prompt

[Scenario] The image shows a grid of letters containing a hidden target word.
[Rules]
1. The target word can be hidden horizontally, vertically, or diagonally.
2. The letters of the word must be adjacent to each other in a straight line.
[Task] Generate a video showing the process of finding the hidden word. Animate a continuous highlight over the correct sequence of letters that spells the target word.

Visual


Initial Frame Letter grid with target word listed	Animation Path traced cell-by-cell to reveal word	Final Frame Hidden word fully highlighted

📖 Task Description

Objective

Locate and trace a hidden target word inside a procedurally generated grid of letters, where the word can be embedded in straight directions (horizontal, vertical, diagonal) and possibly bent.

Task Setup

Grid: NxN cells filled with random letters; one path encodes the target word.
Target word: Drawn from an English wordlist; declared in the prompt.
Directions: Horizontal, vertical, diagonal — any straight or path-traced sequence.
Distractors: All other cells contain randomly sampled letters that may share prefixes with the target.
Solver: Deterministic search over all valid letter placements; ground-truth path is recorded at generation time.

Key Features

Multi-cell sequential pattern matching: Recognizing the word requires aligning multiple consecutive cell observations.
Direction ambiguity: Many false leads share initial letters; the agent must validate the full sequence.
Visual trace verification: Each step of the trace is rendered, providing intermediate correctness signal.
Long-horizon symbolic reasoning: Identification spans the full word length (typically 5–10 cells).

📦 Data Format

data/questions/Multi-13_wordsearch_path_data-generator_task/Multi-13_wordsearch_path_data-generator_00000000/
├── first_frame.png            # Letter grid with target word listed
├── final_frame.png            # Hidden word path highlighted
├── prompt.txt                 # Task instruction
├── ground_truth.mp4           # Animation of cell-by-cell trace
└── question_metadata.json     # Standardized VBVR task metadata

File specifications:

Images: 1024×1024 PNG format
Video: MP4 format, 16 fps, H.264 + yuv420p
Metadata: VBVR canonical schema with task_id, vbvr_task_code, media, parameters

🏷️ Tags

word-search letter-grid pattern-matching path-tracing csp multi-step-reasoning symbolic

Part of the 36-Task Long-Horizon Multi-Step Reasoning Benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
core		core
examples		examples
samples		samples
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-13: Word Search Path Data Generator

📌 Basic Information

🚀 Usage

Installation

Generate Data

Command-Line Options

📖 Task Example

Prompt

Visual

📖 Task Description

Objective

Task Setup

Key Features

📦 Data Format

🏷️ Tags

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-13: Word Search Path Data Generator

📌 Basic Information

🚀 Usage

Installation

Generate Data

Command-Line Options

📖 Task Example

Prompt

Visual

📖 Task Description

Objective

Task Setup

Key Features

📦 Data Format

🏷️ Tags

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages