Smart Multi-Agent Robot Task Planning using Large Language Models
Note
Project Context: This repository is an enhanced implementation of the framework proposed in the research paper "SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models" (ICRA 2024).
This fork improves upon the original baselines by integrating Google's Gemini 2.0 Flash/Pro models, offering superior reasoning capabilities and cost-efficiency compared to the original GPT-based implementation.
This project builds strictly upon the foundational work by: Shyam Sundar Kannan, Vishnunandan L. N. Venkatesh, and Byung-Cheol Min
- Paper Submitted to: IEEE International Conference on Robotics and Automation (ICRA), 2024
- Original Source: Project Page | arXiv | Video
Abstract of Original Work: SMART-LLM utilizes Large Language Models (LLMs) to convert high-level instructions (e.g., "Prepare breakfast") into multi-robot task plans. It employs a multi-stage process involving Task Decomposition, Coalition Formation, and Task Allocation, validated within the AI2-THOR environment.
We have modernized and optimized the codebase to achieve higher benchmarks:
- Gemini 2.0 Integration: Replaced OpenAI calls with Google's
gemini-2.0-flash, reducing latency and improving plan accuracy. - Code Quality: Refactored scripts for professional engineering standards, removing redundant code and improving maintainability.
Create a conda environment:
conda create -n smartllm python==3.9
conda activate smartllmpip install -r requirments.txtThis project uses a secure .env file for configuration.
- Get a Key: Obtain a Google API Key from Google AI Studio.
- Create Config: Create a file named
.envin the root directory. - Add Key:
GOOGLE_API_KEY="your_api_key"
Run the script to generate executable robot plans for a specific floor plan.
python3 scripts/run_llm.py --floor-plan {floor_plan_no} --model gemini-2.0-flash- Result: Executable code will be generated in the
logs/gemini_runs/directory.
To visualize and verify the generated plan:
- Identify the generated folder in
logs/gemini_runs/. - Run the execution script:
python3 scripts/execute_plan.py --command {generated_folder_name}
- Task Data: Located in
data/final_test/(Instructions, robot capabilities, ground truth). - Robot Skills: Defined in
resources/robots.py. - AI2-THOR: Layouts can be previewed at AI2-THOR Demo.
If you use this work, please cite the original authors who established this framework:
@article{kannan2023smart,
title={SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models},
author={Kannan, Shyam Sundar and Venkatesh, Vishnunandan LN and Min, Byung-Cheol},
journal={arXiv preprint arXiv:2309.10062},
year={2023}
}