PhotoAgent

Agentic Photo Editing with Exploratory Visual Aesthetic Planning

Mingde Yao^1,5 Zhiyuan You¹ King-Man Tam⁴ Menglu Wang³ Tianfan Xue^1,2,5

¹CUHK MMLab ²Shanghai AI Lab ³USTC ⁴Institute of Science Tokyo ⁵CPII InnoHK

One goal, one click, autonomous enhancement. PhotoAgent turns photos into professionally edited results through exploratory visual aesthetic planning — no step-by-step prompts required.

Paradigm: From Human-in-the-loop to Agent-in-the-loop

Traditional photo editing demands expertise, endless parameter tuning, and exhausting trial-and-error. PhotoAgent replaces this fragile human-in-the-loop pipeline with an autonomous agent-in-the-loop system — one that perceives, plans, executes, and evaluates like a seasoned professional.

Highlights

	Feature	Description
🔄	Closed-loop Planning	Perceive–plan–execute–evaluate cycle with action memory and visual feedback. No open-loop, single-shot edits.
🌳	MCTS-based Aesthetic Planner	Monte Carlo Tree Search explores editing trajectories, avoids short-sighted or irreversible decisions.
🧠	Action Memory & History	Full editing history prevents redundant operations, enables context-aware decisions and faster convergence.
🎯	Scene-Aware Classification	Fine-grained scene classification (portrait, landscape, urban, food, low-light, indoor) with scene-specific strategies.
🛠	Rich Toolset	Orchestrates GPT-Image-1, Flux.1 Kontext, Step1X-Edit, Nano Banana, ZImage, and more.
📐	UGC-Oriented Evaluation	UGC-Edit dataset (~7,000 photos) and reward model trained via GRPO on Qwen2.5-VL.

How PhotoAgent Works

PhotoAgent formulates autonomous image editing as a long-horizon decision-making problem with four core components in a closed-loop system:

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│   ┌───────────┐    ┌──────────┐    ┌──────────┐    ┌─────────────┐
│   │ Perceiver │───▶│ Planner  │───▶│ Executor │───▶│  Evaluator  │
│   │  (VLM)    │    │  (MCTS)  │    │  (Tools) │    │  (Metrics)  │
│   └───────────┘    └──────────┘    └──────────┘    └──────┬──────┘
│         ▲                                                  │
│         └──────────── memory + feedback ───────────────────┘
│                                                              │
└──────────────────────────────────────────────────────────────┘

Step	Component	What it does
1	Perceiver	VLM (Qwen3-VL) interprets the image and proposes K diverse, atomic editing actions
2	Planner	MCTS explores candidate actions via selection, expansion, simulation, and backpropagation
3	Executor	Runs selected actions with traditional or generative tools; retains the highest-scoring result
4	Evaluator	Ensemble of no-reference metrics, CLIP scores, and UGC reward model; drives re-planning when needed

Results

State-of-the-art on instruction adherence and visual quality; preferred in user studies.

Editing Examples

Input	Output	Input	Output

More results and interactive comparisons on the project page.

Iterative Editing Process

PhotoAgent refines images over multiple iterations, with each round building on the previous result:

_Original

→

_{Iter 1: Color & tone}

→

_{Iter 2: Atmosphere}

→

_{Iter 3: Final polish}

UGC-Edit Dataset & Benchmark

Component	Details
UGC-Edit Dataset	~7,000 authentic user-generated photos from LAION Aesthetic & RealQA, with human aesthetic scores (1–5 scale)
UGC Reward Model	Qwen2.5-VL fine-tuned with GRPO; learns from relative rankings for robust aesthetic scoring
Editing Benchmark	1,017 real-world photos covering portraits, landscapes, urban scenes, food, low-light, and more

Code

Coming soon! Code, pretrained models, and demo will be released here. Star the repo to stay updated.

Citation

@article{yao2025photoagent,
  title   = {PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic Planning},
  author  = {Yao, Mingde and You, Zhiyuan and Tam, King-Man and Wang, Menglu and Xue, Tianfan},
  year    = {2025}
}

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhotoAgent

Agentic Photo Editing with Exploratory Visual Aesthetic Planning

Paradigm: From Human-in-the-loop to Agent-in-the-loop

Highlights

How PhotoAgent Works

Results

Editing Examples

Iterative Editing Process

UGC-Edit Dataset & Benchmark

Code

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

mdyao/PhotoAgent

Folders and files

Latest commit

History

Repository files navigation

PhotoAgent

Agentic Photo Editing with Exploratory Visual Aesthetic Planning

Paradigm: From Human-in-the-loop to Agent-in-the-loop

Highlights

How PhotoAgent Works

Results

Editing Examples

Iterative Editing Process

UGC-Edit Dataset & Benchmark

Code

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages