Skip to content

Add AI to describe each image in a single sentence #553

@ned14

Description

@ned14

https://huggingface.co/vikhyatk/moondream2 integration for Damselfly would be great. When given a prompt such as:

Describe this image and its style in a very detailed manner, follow the format of describing: what, who, where, when, how. You don't need to fill in all if they are irrelevant. Please remove What, Who, Where, When, How prefixes and make it one sentence.

... and fed a photo, you might get back:

A woman with blonde hair walks along a beach, her back to the camera, with the ocean and mountains visible in the background.

What you then do is output a json file with the descriptions, tags and URLs to the photos. You submit that to an agentic AI with the prompt:

Examine the photos containing women with blonde hair taken in the year 2025 by downloading and inspecting the image linked per entry. List only those entries containing a billboard with the text "Coca Cola" on them and where a red setter dog was present.

I'm sure you can see the utility when you have a very large photo collection and need to reduce your search space. If you have access to a Mac, the 'Draw Things' app lets you run 'Investigate' AIs on images so you can test this for yourself.

Moondream2 needs about 4 Gb of RAM to run, it will be a lot more costly than any of the AI you use so far, but if you leave it running long enough it'll get there and it's a one time processing cost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions