A generalized browser-automation agent that converts natural-language UI tasks into executable, DOM-grounded Playwright workflows, capturing screenshots of each meaningful UI state along the way.
Ask it “how do I create a list in a Trello board?”
…and the system will:
- Interpret the instruction via a Semantic Planner LLM
- Navigate a real browser using Playwright
- Dynamically inspect and extrapolate meaningful info from the DOM, live, between steps
- Refine the plan into exact click / type / wait actions
- Capture screenshots for every semantic UI state
- Write a fully-structured dataset of the workflow
"steps": [
{
"step": 1,
"description": "Navigate to Trello",
"action": "goto",
"screenshot": "step_1.png"
},
{
"step": 2,
"description": "Click the 'Create' button to open the board creation menu",
"action": "click",
"screenshot": "step_2.png"
},
{
"step": 3,
"description": "Click the 'Create board' button to select the Create board option from the menu",
"action": "click",
"screenshot": "step_3.png"
},
{
"step": 4,
"description": "Wait for the 'Add a list' section to appear on the new board",
"action": "wait",
"screenshot": "step_4.png"
},
{
"step": 5,
"description": "Click on the 'Add a list' text box to enter a new list name.",
"action": "click",
"screenshot": "step_5.png"
},
{
"step": 6,
"description": "Enter the name for the new list",
"action": "type",
"screenshot": "step_6.png"
}
]
}