AI that enhances your story. Watch keenly. Act thoughtfully. 10x your productivity.
⌘ + ⇧ + H to observe your screen instantly
🌐 Website · 📖 Documentation · 🐛 Report Bug · 💡 Request Feature
Traditional AI waits for your commands. Hawkeye watches and helps proactively.
Hawkeye is an AI-powered desktop assistant that observes your work environment—screen, clipboard, files—and proactively offers intelligent suggestions. No prompts needed.
The AI behind Hawkeye is designed to enhance your own story — turning your screen time into meaningful personal growth by automatically mapping your goals, habits, and progress into a living Life Tree.
| Feature | Copilot / Cursor / Claude Code | Hawkeye |
|---|---|---|
| Mode | Reactive (you ask) | Proactive (it watches) |
| Scope | Code only | Everything: coding, browsing, writing |
| Privacy | Cloud-based | Local-first, your data stays local |
| Control | AI executes | You decide what to execute |
|
|
|
|
|
|
| Platform | Download |
|---|---|
⚠️ macOS: "App is damaged" fix
# Remove quarantine attribute
xattr -cr /Applications/Hawkeye.app# 1. Clone
git clone https://github.com/tensorboy/hawkeye.git && cd hawkeye
# 2. Install
pnpm install
# 3. Run
pnpm devOption 1: Google Gemini (Recommended — free tier)
- Get a free API key at aistudio.google.com/apikey
- Enter your key in Settings → Gemini API Key
- Model defaults to
gemini-2.0-flash(1M context window)
Option 2: OpenAI-Compatible API
Works with OpenAI, DeepSeek, Groq, Together AI, or any OpenAI-compatible endpoint.
Set your base URL, API key, and model name in Settings.
Option 3: Local LLM with node-llama-cpp (100% Offline)
Download a GGUF model and set the model path in Settings. Supports Metal GPU acceleration on macOS.
Recommended models:
- Qwen 2.5 7B — general purpose (4.7 GB)
- Llama 3.2 3B — lightweight (2.0 GB)
- LLaVA 1.6 7B — vision support (4.5 GB)
Option 4: Ollama (Legacy)
brew install ollama && ollama pull qwen3:8bSelect "Ollama" in Hawkeye settings.
┌─────────────────────────────────────────────────────────────────┐
│ HAWKEYE ENGINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PERCEPTION │───▶│ REASONING │───▶│ EXECUTION │ │
│ │ Engine │ │ Engine │ │ Engine │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ • Screen OCR • Claude/Ollama • Shell Commands │
│ • Clipboard • Task Analysis • File Operations │
│ • File Watch • Intent Detect • App Control │
│ • Window Track • Suggestions • Browser Auto │
│ │
├─────────────────────────────────────────────────────────────────┤
│ INTERFACES │
├───────────────┬───────────────┬───────────────┬─────────────────┤
│ 🖥️ Desktop │ 🧩 VS Code │ 🌐 Chrome │ 📦 Core │
│ (Electron) │ Extension │ Extension │ (npm pkg) │
└───────────────┴───────────────┴───────────────┴─────────────────┘
Hawkeye is evolving into a full multi-modal human-computer interaction system that combines audio understanding, visual perception, and gesture control.
┌─────────────────────────────────────────────────────────────────────────────┐
│ HAWKEYE MULTI-MODAL HCI PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ INPUT LAYER │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ 📷 Camera ────▶ MediaPipe Holistic │ │
│ │ • Face: 468 landmarks │ │
│ │ • Pose: 33 keypoints │ │
│ │ • Hands: 21 × 2 keypoints │ │
│ │ │ │
│ │ 🎙️ Microphone ─▶ Silero VAD ─▶ Audio Buffer │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────┐ ┌──────────────────────────────────┐ │
│ │ VISUAL PROCESSING │ │ AUDIO PROCESSING │ │
│ ├──────────────────────────────┤ ├──────────────────────────────────┤ │
│ │ Face Tracker │ │ DiariZen / Pyannote │ │
│ │ ├─ Multi-face detection │ │ ├─ Speaker diarization │ │
│ │ ├─ Face ID assignment │ │ ├─ "Who is speaking?" │ │
│ │ └─ Lip movement analysis │ │ └─ Speaker embeddings │ │
│ │ │ │ │ │
│ │ Gesture Recognizer │ │ Whisper (smart-whisper) │ │
│ │ ├─ Hand pose classification │ │ ├─ Speech-to-text │ │
│ │ ├─ Dynamic gesture detect │ │ ├─ Language detection │ │
│ │ └─ Custom gesture mapping │ │ └─ Timestamp alignment │ │
│ └──────────────────────────────┘ └──────────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ FUSION & MATCHING LAYER │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Audio-Visual Matching │ │
│ │ ├─ Lip-sync correlation (who's lips match the audio?) │ │
│ │ ├─ Face-voice association (learn speaker identity) │ │
│ │ └─ Active speaker detection (LoCoNet / AS-Net) │ │
│ │ │ │
│ │ Context Aggregation │ │
│ │ ├─ Combine: transcription + speaker ID + face ID + gesture │ │
│ │ └─ Generate unified interaction events │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ACTION EXECUTION │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Gesture → Command Mapping │ │
│ │ ├─ 👍 Thumbs Up → Confirm action │ │
│ │ ├─ ✋ Open Palm → Pause / Stop │ │
│ │ ├─ 👆 Point Up → Scroll up │ │
│ │ ├─ 👇 Point Down → Scroll down │ │
│ │ ├─ ✌️ Victory → Screenshot │ │
│ │ ├─ 🤏 Pinch → Zoom in/out │ │
│ │ └─ 🖐️ Swipe → Switch window / tab │ │
│ │ │ │
│ │ Voice Command + Gesture = Enhanced Control │ │
│ │ └─ "Open browser" + Point → Open browser at pointed location │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ OUTPUT │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ 📝 Attributed Transcription │ │
│ │ "Alice: Let's review the code changes" │ │
│ │ "Bob: I'll share my screen [👆 pointing at screen]" │ │
│ │ │ │
│ │ 🎮 System Control │ │
│ │ Mouse movement, clicks, keyboard shortcuts, app switching │ │
│ │ │ │
│ │ 🌳 Life Tree Update │ │
│ │ Activity tracking, goal inference, habit analysis │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Key Technologies:
| Component | Technology | Status |
|---|---|---|
| Voice Activity Detection | Silero VAD | ✅ Planned |
| Speech-to-Text | Whisper (smart-whisper) | ✅ Implemented |
| Speaker Diarization | DiariZen / Pyannote | 🔄 Research |
| Active Speaker Detection | LoCoNet (CVPR 2024) | 🔄 Research |
| Body Tracking | MediaPipe Holistic | ✅ Planned |
| Gesture Recognition | MediaPipe Gesture | ✅ Planned |
| Face-Voice Matching | Custom Fusion | 🔄 Research |
hawkeye/
├── packages/
│ ├── core/ # 🧠 Core engine (local processing)
│ │ ├── perception/ # Screen, clipboard, file monitoring
│ │ ├── ai/ # AI providers (Claude, Ollama, etc.)
│ │ ├── execution/ # Action execution system
│ │ └── storage/ # Local database (SQLite)
│ │
│ ├── desktop/ # 🖥️ Electron desktop app
│ ├── vscode-extension/ # 🧩 VS Code extension
│ └── chrome-extension/ # 🌐 Chrome browser extension
│
├── docs/ # 📖 Documentation
└── website/ # 🌐 Marketing site
| Aspect | How We Protect You |
|---|---|
| Screenshots | ✅ Analyzed locally, never uploaded |
| Clipboard | ✅ Processed on-device only |
| Files | ✅ Monitored locally, paths never sent |
| AI Calls | ✅ Only minimal context text sent (or use local LLM) |
| Dangerous Ops | ✅ Always requires your confirmation |
📁 All data stored in
~/.hawkeye/— you own your data.
import { HawkeyeEngine } from '@hawkeye/core';
const engine = new HawkeyeEngine({
provider: 'ollama',
model: 'qwen3:8b'
});
// Get AI-powered suggestions based on current context
const suggestions = await engine.observe();
// Execute a suggestion with user confirmation
await engine.execute(suggestions[0].id);import { FileWatcher } from '@hawkeye/core';
const watcher = new FileWatcher({
paths: ['~/Downloads', '~/Documents'],
events: ['create', 'move']
});
watcher.on('change', (event) => {
console.log(`${event.type}: ${event.path}`);
});AI provider calls use exponential backoff with jitter to handle transient failures gracefully, preventing thundering herd effects.
Context history (window titles, clipboard, OCR text) is indexed with SQLite FTS5 for instant fuzzy search across all recorded observations.
The observation interval adjusts dynamically based on user activity — fast polling when active, slow polling when idle — saving CPU and battery.
A priority-based task queue with deduplication ensures that AI requests and plan executions are processed efficiently without duplicate work.
Hawkeye exposes 15+ tools via MCP (Model Context Protocol) for screen perception, window management, file organization, and automation.
An agent monitor enforces cost limits, blocks dangerous operations (e.g. rm -rf /), requires confirmation for risky actions, and supports a sandbox mode.
A macOS-style popover panel accessible from the system tray provides quick actions, recent activity feed, and real-time module status indicators.
All AI providers declare their capabilities (chat, vision, streaming, function calling), enabling intelligent routing and health monitoring across providers.
- Core perception engine
- Desktop app (Electron)
- VS Code extension
- Chrome extension
- Local LLM support (Ollama, node-llama-cpp)
- Multi-provider AI (Gemini, OpenAI-compatible, LlamaCpp)
- Provider unified protocol with capability routing
- Streaming and health check support
- SQLite FTS5 full-text search
- Exponential backoff retry strategy
- Adaptive refresh rate
- Priority task queue
- MCP Server with 15+ tools
- Safety guardrails and agent monitoring
- Menu bar panel (macOS-style popover)
- Life Tree — AI maps your life journey and enhances your story
- Desktop ↔ Extension real-time sync
- Plugin system
- Custom workflow builder
- Mobile companion app
Contributions are what make the open source community amazing! Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
Distributed under the MIT License. See LICENSE for more information.
