Occupancy grid image #822

s-desh · 2025-12-10T13:00:05Z

This PR is one of many for encoding maps for agents. #804

Adds

Evals that use map images for point placement and map comprehension.
OccupancyGridImage, that encodes OccupancyGrid as RGB image and overlays robot pose.
Interpret map skill - pull maps and place points based on the query for navigation.
Vibe coded annotater to add queries on evals

Evals and Results

A dataset of floorplans (only 2 right now, with variations) are used to generate grids and evaluated on point placement and map comprehension. This dataset can be populated by adding a new floorplans with expected answers for queries.

dimos/agents2/skills/interpret_map/eval/test_map_interpretability.yaml has queries with varying difficulty to evaluate spatial reasoning. For now, the minimum pass rate for point placement and map comprehension is set to 0.25 and 0.7 respectively.

Run

For evals run pytest -s dimos/agents2/skills/interpret_map/eval/test_map_eval.py.

Examples of successful point placement results.

Go to the conference table in the office	Second room to the robot’s left along the corridor

a point immediately behind the robot	second room to the robot’s left along the corridor

Debug

Failed point placement tasks are store the image for debugging in this format - debug_goal_placement_<map_id>_<query>.png. The goal placed is marked with +
Failed map comprehension answers are logged in the terminal.

Adding new maps and queries for eval

Get a map image, black representing obstacles, white freespace and gray for unexplored regions.
Add an entry with a new map_id, image_path, robot_pose.position in pixels and orientation under map_comprehension_tests or point_placement_tests in dimos/agents2/skills/interpret_map/eval/test_map_interpretability.yaml.
For point placement tests, use dimos/agents2/skills/interpret_map/eval/annotate.py <image.png> to create bounding boxes and question pairs. These are saved into a questions.yaml, copy them to main testing yaml mentioned above.
For map comprehension tests, manually add questions and expected regex patterns to check in the main yaml

Running with agents

The interpret_map_skill is to be able to pull the map / place a goal based on the query and return the world coordinates to navigate.

Run dimos --replay run unitree-go2-agentic --extra-module interpret_map_skill

In the cli, queries like "get a goal right in front of the robot", "get a goal to the northeast side of the map" can be asked.

Few observations

Orienting the map so the robot always points up improves scores on robot centric queries like "second room to the robot's left" etc.
Point placement is highly sensitive to prompt, example the points identified are far off from description if the prompt mentions "place the point only in free (white) space" repeatedly. Moving a goal to the nearest free space in post processing is feasible. Another example - using white pixels instead of white area in prompt gives better results
The VLM does better at queries that do not need robot orientation to answer.
Noise influences how well the robot orientation is understood.
Qwen pixel identification works well when max dimension is limited to 1024px, while keeping aspect ratio.

chatgpt-codex-connector · 2025-12-10T13:00:10Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

greptile-apps · 2025-12-10T13:08:47Z

Greptile Overview

Greptile Summary

This PR introduces OccupancyGridImage, a new class that converts occupancy grid maps into RGB images for vision-language model (VLM) interpretation. The implementation enables natural language-based goal identification on maps through the new InterpretMapSkill, which uses a Qwen VLM to identify goal positions from user descriptions and converts pixel coordinates back to world coordinates for navigation.

Key Changes:

Added OccupancyGridImage class with coordinate transformation methods (pixel_to_grid, grid_to_pixel, pixel_to_world)
Color mapping: blue for free space, yellow for unknown, red shades for obstacles, green marker for robot pose
New InterpretMapSkill that queries VLM with occupancy grid images to identify goal locations
Added navigate_with_position skill to navigate to specific world coordinates
Configuration support for memory limits via GlobalConfig
Comprehensive test suites for coordinate conversions and map interpretation

Issues Found:

Missing bounds checking in is_free_space and _overlay_robot_pose could cause IndexError or rendering issues with edge coordinates

Confidence Score: 3/5

Safe to merge after fixing bounds checking issues in coordinate transformations
The PR adds valuable VLM-based map interpretation functionality with comprehensive tests, but has critical bounds checking issues in OccupancyGridImage.is_free_space() and _overlay_robot_pose() that could cause runtime errors when coordinates fall outside grid boundaries. These need to be fixed before merging.
dimos/msgs/nav_msgs/OccupancyGridImage.py requires bounds checking fixes on lines 162 and 202

Important Files Changed

File Analysis

Filename	Score	Overview
dimos/msgs/nav_msgs/OccupancyGridImage.py	3/5	New class added to convert occupancy grids to images with pixel/grid/world coordinate transformations; potential out-of-bounds access issues in `is_free_space` and `_overlay_robot_pose`
dimos/agents2/skills/interpret_map.py	4/5	New skill module for interpreting maps using VLM to identify goal positions from natural language descriptions; handles free space validation and coordinate conversion
dimos/agents2/skills/navigation.py	5/5	Added `navigate_with_position` skill to navigate to specific world coordinates obtained from map

Sequence Diagram

sequenceDiagram
    participant User
    participant InterpretMapSkill
    participant OccupancyGrid
    participant OccupancyGridImage
    participant QwenVlModel
    participant NavigationSkill

    User->>InterpretMapSkill: get_goal_position(description)
    InterpretMapSkill->>InterpretMapSkill: Retrieve latest costmap
    InterpretMapSkill->>OccupancyGridImage: from_occupancygrid(costmap, robot_pose)
    OccupancyGridImage->>OccupancyGrid: Convert grid to RGB image
    OccupancyGridImage->>OccupancyGridImage: _overlay_robot_pose()
    OccupancyGridImage->>OccupancyGridImage: Flip vertically & resize
    OccupancyGridImage-->>InterpretMapSkill: OccupancyGridImage with Image
    InterpretMapSkill->>QwenVlModel: query(image, prompt)
    QwenVlModel-->>InterpretMapSkill: JSON response with pixel coordinates
    InterpretMapSkill->>InterpretMapSkill: extract_coordinates()
    InterpretMapSkill->>OccupancyGridImage: is_free_space(x, y)
    OccupancyGridImage->>OccupancyGridImage: pixel_to_grid(x, y)
    OccupancyGridImage->>OccupancyGrid: Check grid[grid_y, grid_x]
    alt Point not in free space
        InterpretMapSkill->>OccupancyGridImage: get_closest_free_point(x, y)
        OccupancyGridImage-->>InterpretMapSkill: Closest free pixel coordinates
    end
    InterpretMapSkill->>OccupancyGridImage: pixel_to_world(x, y)
    OccupancyGridImage->>OccupancyGridImage: pixel_to_grid(x, y)
    OccupancyGridImage->>OccupancyGrid: grid_to_world(grid_point)
    OccupancyGrid-->>OccupancyGridImage: World coordinates (Vector3)
    OccupancyGridImage-->>InterpretMapSkill: goal_pose (Vector3)
    InterpretMapSkill-->>User: goal_pose
    User->>NavigationSkill: navigate_with_position(x, y, z)
    NavigationSkill->>NavigationSkill: Create PoseStamped goal
    NavigationSkill->>NavigationSkill: _navigate_to(goal_pose)
    NavigationSkill-->>User: Success message

greptile-apps

_{13 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

dimos/agents2/skills/interpret_map/OccupancyGridImage.py

leshy

just these small things, otherwise looks good

dimos/agents2/skills/navigation.py

dimos/agents2/skills/test_map_eval.py

dimos/agents2/skills/interpret_map/OccupancyGridImage.py

leshy · 2025-12-16T21:32:59Z

I wrote a quick way to ask an agent a question and see the result in foxglove in realtime on top of your map, Your resolution was wrong, you were placing points in pixels and not meters, for example [480, 270, 0.0] is 480,270 meters away from zero zero on the map.

I made the system define transforms correctly (for occupancygrid world frame, for robot base_link)

answer to "conference room with a bunch of chairs"

likely something wrong with how image is rendered for an agent? idk, but wanted this so I can ask a few questions myself and see results

run foxglove-bridge in console, run foxglove, import occupancygrid_agent_foxglove.json dashboard in your evals/ dir

run (twice initially to see the image, becuase bridge is a bit dumb)

pytest -svk ivan dimos/agents2/skills/interpret_map/eval/test_map_eval.py

s-desh · 2025-12-17T11:54:54Z

Your resolution was wrong, you were placing points in pixels and not meters, for example [480, 270, 0.0] is 480,270 meters away from zero zero on the map.

This is taken care by position=[ i * self.occupancy_grid.info.resolution for i in self.robot_pose["position"] # convert pixels to meters ], and works as expected in tests. I've removed it now, you can continue using meters for position.

Pixel to world conversion was incorrect in your version, thats fixed. The actual response for your query looks like

Adding "long table" in your query gives a correct response.

paul-nechifor · 2025-12-23T22:27:42Z

dimos/agents2/skills/interpret_map/OccupancyGridImage.py

+        max(10, int(min_dimension * 0.035))
+
+        max(1, int(min_dimension * 0.005))


These don't do anything. I guess they were unused variables which were removed by the linter?

yeah these were for the arrow thats removed

paul-nechifor · 2025-12-23T22:38:51Z

dimos/agents2/skills/interpret_map/OccupancyGridImage.py

+            cost value at the specified pixel
+        """
+
+        size = size or self.size


size is defined as a tuple and has a default value, so there's no need for the or.

paul-nechifor · 2025-12-23T22:56:01Z

dimos/agents2/skills/interpret_map/eval/test_map_eval.py

+    Attributes:
+        image_path (str): Path to the map image file.
+        robot_pose (dict): Robot's pose in the map with keys 'position' (list of 3 floats - X Y Z) and 'orientation' (Quaternion).
+        occupancy_grid (OccupancyGrid): Generated occupancy grid from the image.
+        image (Image | None): Generated OccupancyGridImage from the occupancy grid.


Please convert this to Python types since mypy doesn't look at docstrings.

paul-nechifor · 2025-12-23T23:03:26Z

dimos/agents2/skills/interpret_map/eval/test_map_eval.py

+        width_scale = self.occupancy_grid.info.width / width
+        height_scale = self.occupancy_grid.info.height / height
+        return width_scale, height_scale


I've noticed you use 1024x1024 images by default. If the width scale and height scale are not the same that produces images which are squashed in a random direction, no? Don't models get confused by such images? Or are you telling the model which way the image is squished?

I don't default to 1024 x 1024 now. The aspect ratio is maintained and max is set to 1024, so both these scales will have the same value, will fix it.

Former-commit-id: f721a9d [formerly ec2cc6d] Former-commit-id: a5f5091

Former-commit-id: d831c5b [formerly d04d8d2] Former-commit-id: fc8655c

Former-commit-id: 1ebf5bd [formerly c1ce353] Former-commit-id: 5906348

Former-commit-id: ea5cf0d [formerly 063c712] Former-commit-id: c1fa3fa

Former-commit-id: 49e4bd4 [formerly 09ef890] Former-commit-id: cefb46a

Former-commit-id: 6e70cc1 [formerly eb37f0d] Former-commit-id: 0233868

…ixes Former-commit-id: 342eac1 [formerly 308afca] Former-commit-id: c82aa82

Former-commit-id: c1f8483 [formerly 97123d1] Former-commit-id: d1211a1

Former-commit-id: 4da9043 [formerly fa06c86] Former-commit-id: 0182f73

Former-commit-id: 4a1b103 [formerly 69ba61f] Former-commit-id: e6f8de4

…ixes Former-commit-id: 4f3f460 [formerly 1bea9c0] Former-commit-id: 3a96a0e

Former-commit-id: 3220ee1 [formerly 08efa4e] Former-commit-id: 1221649

Former-commit-id: 8add4bc [formerly eb9e786] Former-commit-id: 22ceffd

Former-commit-id: 690748f [formerly a76034e] Former-commit-id: 5a29dbc

Former-commit-id: 4757c1c [formerly 86641ef] Former-commit-id: 98761db

Former-commit-id: 394f572 [formerly 8f01873] Former-commit-id: 2f296d4

Former-commit-id: 2c9e1be [formerly 2b25e30] Former-commit-id: 886c43c

Former-commit-id: db6bf76 Former-commit-id: 599b66f

Former-commit-id: 6fbe4eb Former-commit-id: 7dad55a

Former-commit-id: b5590e2 Former-commit-id: 6377930

Former-commit-id: c634412 Former-commit-id: dc3c687

Former-commit-id: 2d0b444 Former-commit-id: 00a1719

Former-commit-id: 79eb8c0 Former-commit-id: f0f486a

Former-commit-id: 3fdbb64 Former-commit-id: c6592a0

Former-commit-id: 1609046 Former-commit-id: 272bef6

Former-commit-id: 842021d Former-commit-id: 86d0b7c

Former-commit-id: 15f8bb1 Former-commit-id: 6ff050c

Former-commit-id: 7165549 Former-commit-id: 260c101

Former-commit-id: 3e5509b Former-commit-id: 90d5a91

greptile-apps · 2026-01-08T13:59:45Z