Skip to content

Conversation

@s-desh
Copy link

@s-desh s-desh commented Dec 10, 2025

This PR is one of many for encoding maps for agents. #804

Adds

  • Evals that use map images for point placement and map comprehension.
  • OccupancyGridImage, that encodes OccupancyGrid as RGB image and overlays robot pose.
  • Interpret map skill - pull maps and place points based on the query for navigation.
  • Vibe coded annotater to add queries on evals

Evals and Results

A dataset of floorplans (only 2 right now, with variations) are used to generate grids and evaluated on point placement and map comprehension. This dataset can be populated by adding a new floorplans with expected answers for queries.

dimos/agents2/skills/interpret_map/eval/test_map_interpretability.yaml has queries with varying difficulty to evaluate spatial reasoning. For now, the minimum pass rate for point placement and map comprehension is set to 0.25 and 0.7 respectively.

Run

For evals run pytest -s dimos/agents2/skills/interpret_map/eval/test_map_eval.py.

Examples of successful point placement results.


Go to the conference table in the office

Second room to the robot’s left along the corridor

a point immediately behind the robot

second room to the robot’s left along the corridor

Debug

  • Failed point placement tasks are store the image for debugging in this format - debug_goal_placement_<map_id>_<query>.png. The goal placed is marked with +
  • Failed map comprehension answers are logged in the terminal.

Adding new maps and queries for eval

  1. Get a map image, black representing obstacles, white freespace and gray for unexplored regions.
  2. Add an entry with a new map_id, image_path, robot_pose.position in pixels and orientation under map_comprehension_tests or point_placement_tests in dimos/agents2/skills/interpret_map/eval/test_map_interpretability.yaml.
  3. For point placement tests, use dimos/agents2/skills/interpret_map/eval/annotate.py <image.png> to create bounding boxes and question pairs. These are saved into a questions.yaml, copy them to main testing yaml mentioned above.
  4. For map comprehension tests, manually add questions and expected regex patterns to check in the main yaml

Running with agents

The interpret_map_skill is to be able to pull the map / place a goal based on the query and return the world coordinates to navigate.

Run dimos --replay run unitree-go2-agentic --extra-module interpret_map_skill

In the cli, queries like "get a goal right in front of the robot", "get a goal to the northeast side of the map" can be asked.

Few observations

  • Orienting the map so the robot always points up improves scores on robot centric queries like "second room to the robot's left" etc.
  • Point placement is highly sensitive to prompt, example the points identified are far off from description if the prompt mentions "place the point only in free (white) space" repeatedly. Moving a goal to the nearest free space in post processing is feasible. Another example - using white pixels instead of white area in prompt gives better results
  • The VLM does better at queries that do not need robot orientation to answer.
  • Noise influences how well the robot orientation is understood.
  • Qwen pixel identification works well when max dimension is limited to 1024px, while keeping aspect ratio.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@s-desh s-desh closed this Dec 10, 2025
@s-desh s-desh reopened this Dec 10, 2025
@greptile-apps
Copy link

greptile-apps bot commented Dec 10, 2025

Greptile Overview

Greptile Summary

This PR introduces OccupancyGridImage, a new class that converts occupancy grid maps into RGB images for vision-language model (VLM) interpretation. The implementation enables natural language-based goal identification on maps through the new InterpretMapSkill, which uses a Qwen VLM to identify goal positions from user descriptions and converts pixel coordinates back to world coordinates for navigation.

Key Changes:

  • Added OccupancyGridImage class with coordinate transformation methods (pixel_to_grid, grid_to_pixel, pixel_to_world)
  • Color mapping: blue for free space, yellow for unknown, red shades for obstacles, green marker for robot pose
  • New InterpretMapSkill that queries VLM with occupancy grid images to identify goal locations
  • Added navigate_with_position skill to navigate to specific world coordinates
  • Configuration support for memory limits via GlobalConfig
  • Comprehensive test suites for coordinate conversions and map interpretation

Issues Found:

  • Missing bounds checking in is_free_space and _overlay_robot_pose could cause IndexError or rendering issues with edge coordinates

Confidence Score: 3/5

  • Safe to merge after fixing bounds checking issues in coordinate transformations
  • The PR adds valuable VLM-based map interpretation functionality with comprehensive tests, but has critical bounds checking issues in OccupancyGridImage.is_free_space() and _overlay_robot_pose() that could cause runtime errors when coordinates fall outside grid boundaries. These need to be fixed before merging.
  • dimos/msgs/nav_msgs/OccupancyGridImage.py requires bounds checking fixes on lines 162 and 202

Important Files Changed

File Analysis

Filename Score Overview
dimos/msgs/nav_msgs/OccupancyGridImage.py 3/5 New class added to convert occupancy grids to images with pixel/grid/world coordinate transformations; potential out-of-bounds access issues in is_free_space and _overlay_robot_pose
dimos/agents2/skills/interpret_map.py 4/5 New skill module for interpreting maps using VLM to identify goal positions from natural language descriptions; handles free space validation and coordinate conversion
dimos/agents2/skills/navigation.py 5/5 Added navigate_with_position skill to navigate to specific world coordinates obtained from map

Sequence Diagram

sequenceDiagram
    participant User
    participant InterpretMapSkill
    participant OccupancyGrid
    participant OccupancyGridImage
    participant QwenVlModel
    participant NavigationSkill

    User->>InterpretMapSkill: get_goal_position(description)
    InterpretMapSkill->>InterpretMapSkill: Retrieve latest costmap
    InterpretMapSkill->>OccupancyGridImage: from_occupancygrid(costmap, robot_pose)
    OccupancyGridImage->>OccupancyGrid: Convert grid to RGB image
    OccupancyGridImage->>OccupancyGridImage: _overlay_robot_pose()
    OccupancyGridImage->>OccupancyGridImage: Flip vertically & resize
    OccupancyGridImage-->>InterpretMapSkill: OccupancyGridImage with Image
    InterpretMapSkill->>QwenVlModel: query(image, prompt)
    QwenVlModel-->>InterpretMapSkill: JSON response with pixel coordinates
    InterpretMapSkill->>InterpretMapSkill: extract_coordinates()
    InterpretMapSkill->>OccupancyGridImage: is_free_space(x, y)
    OccupancyGridImage->>OccupancyGridImage: pixel_to_grid(x, y)
    OccupancyGridImage->>OccupancyGrid: Check grid[grid_y, grid_x]
    alt Point not in free space
        InterpretMapSkill->>OccupancyGridImage: get_closest_free_point(x, y)
        OccupancyGridImage-->>InterpretMapSkill: Closest free pixel coordinates
    end
    InterpretMapSkill->>OccupancyGridImage: pixel_to_world(x, y)
    OccupancyGridImage->>OccupancyGridImage: pixel_to_grid(x, y)
    OccupancyGridImage->>OccupancyGrid: grid_to_world(grid_point)
    OccupancyGrid-->>OccupancyGridImage: World coordinates (Vector3)
    OccupancyGridImage-->>InterpretMapSkill: goal_pose (Vector3)
    InterpretMapSkill-->>User: goal_pose
    User->>NavigationSkill: navigate_with_position(x, y, z)
    NavigationSkill->>NavigationSkill: Create PoseStamped goal
    NavigationSkill->>NavigationSkill: _navigate_to(goal_pose)
    NavigationSkill-->>User: Success message
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@leshy leshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just these small things, otherwise looks good

@s-desh s-desh force-pushed the occupancy_grid_image branch 3 times, most recently from 0bdf5a6 to 6fbe4eb Compare December 16, 2025 14:13
@leshy
Copy link
Contributor

leshy commented Dec 16, 2025

I wrote a quick way to ask an agent a question and see the result in foxglove in realtime on top of your map, Your resolution was wrong, you were placing points in pixels and not meters, for example [480, 270, 0.0] is 480,270 meters away from zero zero on the map.

I made the system define transforms correctly (for occupancygrid world frame, for robot base_link)

answer to "conference room with a bunch of chairs"

2025-12-16_23-30

likely something wrong with how image is rendered for an agent? idk, but wanted this so I can ask a few questions myself and see results

run foxglove-bridge in console, run foxglove, import occupancygrid_agent_foxglove.json dashboard in your evals/ dir

run (twice initially to see the image, becuase bridge is a bit dumb)

pytest -svk ivan dimos/agents2/skills/interpret_map/eval/test_map_eval.py

@s-desh
Copy link
Author

s-desh commented Dec 17, 2025

Your resolution was wrong, you were placing points in pixels and not meters, for example [480, 270, 0.0] is 480,270 meters away from zero zero on the map.

This is taken care by position=[ i * self.occupancy_grid.info.resolution for i in self.robot_pose["position"] # convert pixels to meters ], and works as expected in tests. I've removed it now, you can continue using meters for position.

Pixel to world conversion was incorrect in your version, thats fixed. The actual response for your query looks like

Screenshot from 2025-12-17 12-44-28

Adding "long table" in your query gives a correct response.

@s-desh s-desh force-pushed the occupancy_grid_image branch from 482ed81 to 7165549 Compare December 23, 2025 15:26
Comment on lines +184 to +186
max(10, int(min_dimension * 0.035))

max(1, int(min_dimension * 0.005))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't do anything. I guess they were unused variables which were removed by the linter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these were for the arrow thats removed

cost value at the specified pixel
"""

size = size or self.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size is defined as a tuple and has a default value, so there's no need for the or.

Comment on lines +46 to +50
Attributes:
image_path (str): Path to the map image file.
robot_pose (dict): Robot's pose in the map with keys 'position' (list of 3 floats - X Y Z) and 'orientation' (Quaternion).
occupancy_grid (OccupancyGrid): Generated occupancy grid from the image.
image (Image | None): Generated OccupancyGridImage from the occupancy grid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert this to Python types since mypy doesn't look at docstrings.

Comment on lines +96 to +98
width_scale = self.occupancy_grid.info.width / width
height_scale = self.occupancy_grid.info.height / height
return width_scale, height_scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed you use 1024x1024 images by default. If the width scale and height scale are not the same that produces images which are squashed in a random direction, no? Don't models get confused by such images? Or are you telling the model which way the image is squished?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't default to 1024 x 1024 now. The aspect ratio is maintained and max is set to 1024, so both these scales will have the same value, will fix it.

@s-desh s-desh force-pushed the occupancy_grid_image branch from ab9769c to 3e5509b Compare December 26, 2025 10:12
leshy and others added 11 commits December 26, 2025 22:37
Former-commit-id: f721a9d [formerly ec2cc6d]
Former-commit-id: a5f5091
Former-commit-id: 1ebf5bd [formerly c1ce353]
Former-commit-id: 5906348
Former-commit-id: ea5cf0d [formerly 063c712]
Former-commit-id: c1fa3fa
Former-commit-id: 49e4bd4 [formerly 09ef890]
Former-commit-id: cefb46a
Former-commit-id: 6e70cc1 [formerly eb37f0d]
Former-commit-id: 0233868
Former-commit-id: c1f8483 [formerly 97123d1]
Former-commit-id: d1211a1
Former-commit-id: 4da9043 [formerly fa06c86]
Former-commit-id: 0182f73
Former-commit-id: 4a1b103 [formerly 69ba61f]
Former-commit-id: e6f8de4
s-desh and others added 18 commits January 3, 2026 23:20
Former-commit-id: 3220ee1 [formerly 08efa4e]
Former-commit-id: 1221649
Former-commit-id: 8add4bc [formerly eb9e786]
Former-commit-id: 22ceffd
Former-commit-id: 690748f [formerly a76034e]
Former-commit-id: 5a29dbc
Former-commit-id: 4757c1c [formerly 86641ef]
Former-commit-id: 98761db
Former-commit-id: 2c9e1be [formerly 2b25e30]
Former-commit-id: 886c43c
Former-commit-id: db6bf76
Former-commit-id: 599b66f
Former-commit-id: 6fbe4eb
Former-commit-id: 7dad55a
Former-commit-id: b5590e2
Former-commit-id: 6377930
Former-commit-id: c634412
Former-commit-id: dc3c687
Former-commit-id: 2d0b444
Former-commit-id: 00a1719
Former-commit-id: 79eb8c0
Former-commit-id: f0f486a
Former-commit-id: 842021d
Former-commit-id: 86d0b7c
Former-commit-id: 15f8bb1
Former-commit-id: 6ff050c
Former-commit-id: 7165549
Former-commit-id: 260c101
Former-commit-id: 3e5509b
Former-commit-id: 90d5a91
@spomichter spomichter force-pushed the occupancy_grid_image branch from 3e5509b to 90d5a91 Compare January 8, 2026 13:59
@spomichter spomichter requested a review from a team January 8, 2026 13:59
@greptile-apps
Copy link

greptile-apps bot commented Jan 8, 2026

Too many files changed for review.

1 similar comment
@greptile-apps
Copy link

greptile-apps bot commented Jan 8, 2026

Too many files changed for review.

@greptile-apps
Copy link

greptile-apps bot commented Jan 8, 2026

Too many files changed for review.

1 similar comment
@greptile-apps
Copy link

greptile-apps bot commented Jan 8, 2026

Too many files changed for review.

@greptile-apps
Copy link

greptile-apps bot commented Jan 8, 2026

Too many files changed for review.

1 similar comment
@greptile-apps
Copy link

greptile-apps bot commented Jan 8, 2026

Too many files changed for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants