Skip to content

FedericoMz/HINTT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HINTT

HINTT Is Not a Traditional Translator, but a Python tool that lets you select a region of your screen, captures it as an image, and uses OpenAI's GPT-4 Vision model to extract and translate text from the screenshot with a simple keyboard shortcut.

HINTT is the perfect companion for playing games in a foreign language or visiting foreign websites!


Features

  • Screen region selector with transparent overlay (Tkinter)
  • Global hotkey (q) for quick screenshot capture
  • Image-to-text translation powered by LLMs
  • On-screen display of the translated text you can move around
  • Retain previously translated text for context
  • Compatible with OpenAI and open-source Ollama models
  • Lightweight, Python-native, and easy to extend!

How To Use

  1. Launch the app.
python main.py
  1. Select a region of the screen with your mouse.
  2. Press q to translate the text in that region.
  3. The selected image is captured and sent to OpenAI's API.
  4. The translated result is shown in a floating window.

Note: HINTT does not support capturing content from other desktops or full-screen apps. Use it in windowed mode.

The GIF below shows an edge case where HINTT excels: translating Japanese text displayed vertically.

HINTT demo showing Japanese text

Requirements and configuration

  • Python 3.10+
  • Access to OpenAI APIs for the best experience

Install dependencies:

pip install -r requirements.txt

In config.env you can set the OpenAI key and customize the prompt (and thus the output language), the OpenAI model used, and how many previous messages are retained for translation context.

If you set RUN_MODE="Ollama" or RUN_MODE="ollama", Ollama models will be used instead. You can set the OCR and translation models respectively with the OLLAMA_OCR_MODEL and OLLAMA_TRANSLATION_MODEL variables. However, experimental results are either worse than with OpenAI (llava), or very slow (llama3.2-vision), at least on a MacBook Air M1.

Usage in Python projects

You can also load and translate images directly in your Python project:

from PIL import Image
from HINTT.tools.translator import OllamaTranslator, OpenAITranslator

img = Image.open('path/to/your/image.png')

openai_translator = OpenAITranslator(
    api_key="sk-...",
    model="gpt-5",
    context_length=20,
    prompt="Translate the following text to English:"
)
result = openai_translator.translate_image(img)

ollama_translator = OllamaTranslator(
    ocr_model="llava:latest",
    translation_model="llama3.2:latest",
    context_length=30,
    prompt="Translate this Japanese text to English:"
)
result = ollama_translator.translate_image(img)
print(result)

To-Do / Improvements

  • A proper GUI
  • Allow customizing the prompt, the output language, and the model
  • Use translation history as context
  • Implement Ollama models as an option

About

An interactive tool for real-time text translation leveraging LLMs' visual capabilities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages