https://huggingface.co/vikhyatk/moondream2 integration for Damselfly would be great. When given a prompt such as:
Describe this image and its style in a very detailed manner, follow the format of describing: what, who, where, when, how. You don't need to fill in all if they are irrelevant. Please remove What, Who, Where, When, How prefixes and make it one sentence.
... and fed a photo, you might get back:
A woman with blonde hair walks along a beach, her back to the camera, with the ocean and mountains visible in the background.
What you then do is output a json file with the descriptions, tags and URLs to the photos. You submit that to an agentic AI with the prompt:
Examine the photos containing women with blonde hair taken in the year 2025 by downloading and inspecting the image linked per entry. List only those entries containing a billboard with the text "Coca Cola" on them and where a red setter dog was present.
I'm sure you can see the utility when you have a very large photo collection and need to reduce your search space. If you have access to a Mac, the 'Draw Things' app lets you run 'Investigate' AIs on images so you can test this for yourself.
Moondream2 needs about 4 Gb of RAM to run, it will be a lot more costly than any of the AI you use so far, but if you leave it running long enough it'll get there and it's a one time processing cost.
https://huggingface.co/vikhyatk/moondream2 integration for Damselfly would be great. When given a prompt such as:
... and fed a photo, you might get back:
What you then do is output a json file with the descriptions, tags and URLs to the photos. You submit that to an agentic AI with the prompt:
I'm sure you can see the utility when you have a very large photo collection and need to reduce your search space. If you have access to a Mac, the 'Draw Things' app lets you run 'Investigate' AIs on images so you can test this for yourself.
Moondream2 needs about 4 Gb of RAM to run, it will be a lot more costly than any of the AI you use so far, but if you leave it running long enough it'll get there and it's a one time processing cost.