Gesture Control transforms your ordinary webcam into a touchless, hands-free input device. Powered by Google's MediaPipe neural network — running entirely on your local CPU — it detects 21 hand landmarks in real time and maps natural hand shapes and motions to system-level actions: moving the mouse, clicking, adjusting volume, switching windows, and more.
The problem it solves:
Traditional input devices create friction. Gesture Control removes the physical barrier between you and your computer. Whether you're presenting, cooking, exercising, or simply exploring the frontier of Human-Computer Interaction, you can control your PC without touching anything.
Who is it for?
- 🧑💻 Developers and makers exploring computer vision and HCI
- ♿ Users who benefit from accessibility-friendly, touch-free PC control
- 🎓 Students learning real-time AI/CV application development
- 🖥️ Presenters who want hands-free slide and window management
$0 · No API key · No internet · Runs 100% on your machine.
| Feature | Description |
|---|---|
| 🖱️ Smooth Mouse Control | Move the cursor naturally using only your index finger — exponential smoothing eliminates jitter |
| 🤏 Left Click | Pinch your thumb and index finger together to click |
| ✌️ Right Click | Peace-sign (index + middle) triggers a right-click |
| 🔊 Real-Time Volume Control | Raise or lower your open hand to set system volume — hand height maps directly to volume level |
| 🔄 Window Switching | Swipe your open hand left or right to cycle through windows with Alt+Tab / Alt+Shift+Tab |
| 🖥️ Show Desktop | Three-finger gesture instantly triggers Win+D |
| 📋 Task View | Index + Pinky (rock sign) opens Win+Tab Task View |
| ⏸️ Pause / Resume | Hold a fist for 1.5 seconds to freeze all gesture controls — hold again to resume |
| 🎨 Live HUD Overlay | Semi-transparent heads-up display with active gesture badge, FPS counter, volume bar, and fading action log |
| ⚡ Zero-Latency Design | pyautogui.PAUSE = 0 + frame-level gesture dispatch — actions fire within the same frame cycle |
| 📴 Fully Offline | No API keys, no cloud endpoints, no telemetry — all inference runs locally on CPU |
| 🚀 One-Click Launcher | START.bat installs all dependencies and launches the app automatically |
The system runs a tight 4-stage pipeline on every captured webcam frame:
┌──────────────────────────────────────────────────────────────────┐
│ PIPELINE OVERVIEW │
└──────────────────────────────────────────────────────────────────┘
[Webcam] 960×540 @ 30 fps
│
▼
┌─────────────────────────────────┐
│ 1. CAPTURE (OpenCV) │
│ cv2.VideoCapture → flip frame │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 2. DETECT (cvzone + MediaPipe)│
│ HandDetector.findHands() │
│ → 21 landmarks (x, y, z) each │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 3. CLASSIFY (get_gesture) │
│ fingersUp() + pinch distance │
│ → POINT / PINCH / PEACE / │
│ OPEN / THREE / ROCK / FIST │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 4. ACT (action_*) │
│ PyAutoGUI → mouse / shortcuts │
│ keyboard → volume keys │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 5. DISPLAY (draw_hud) │
│ OpenCV overlay → cv2.imshow │
└─────────────────────────────────┘
- Capture — OpenCV reads webcam frames at 960×540 @ 30 fps and flips them horizontally for a mirror-like experience.
- Detect — cvzone's
HandDetector(backed by MediaPipe BlazePalm + Hand Landmark models) extracts 21 3D landmarks for up to one hand per frame. - Classify —
fingersUp()returns a binary array of which fingers are extended. Combined with the Euclidean pinch distance between thumb tip (lm[4]) and index tip (lm[8]), a named gesture label is assigned. - Act — The label is dispatched to a dedicated action function. Cooldown timers prevent accidental double-firing. Mouse movement uses exponential smoothing to eliminate jitter.
- Display — OpenCV draws a semi-transparent dark HUD strip, gesture badge, volume bar, swipe notification, fist-hold progress ring, and a fading action log onto each frame before displaying it.
User's Hand
│
│ (webcam video)
▼
OpenCV (cv2.VideoCapture)
│
│ raw BGR frames
▼
cvzone HandDetector ──── MediaPipe BlazePalm + Landmark Model
│
│ lmList[21][x,y,z] + fingersUp[5]
▼
Gesture Classifier (get_gesture)
│
│ gesture label: POINT / PINCH / PEACE / OPEN / THREE / ROCK / FIST
▼
Action Dispatcher
├── action_mouse() ──→ PyAutoGUI.moveTo()
├── action_click() ──→ PyAutoGUI.click()
├── action_rclick() ──→ PyAutoGUI.rightClick()
├── action_volume() ──→ keyboard.send("volume up/down")
├── action_swipe() ──→ PyAutoGUI.hotkey("alt","tab")
├── action_win_d() ──→ PyAutoGUI.hotkey("win","d")
├── action_win_tab() ──→ PyAutoGUI.hotkey("win","tab")
└── action_fist() ──→ Toggle paused state
│
▼
draw_hud (OpenCV overlay)
│
▼
cv2.imshow → Screen
| Layer | Technology | Purpose |
|---|---|---|
| Language | Python 3.11 | Core runtime |
| Hand Tracking | cvzone + MediaPipe 0.10.14 |
21-landmark hand detection |
| Computer Vision | OpenCV | Webcam capture, frame processing, HUD rendering |
| Mouse / Keyboard | PyAutoGUI | Mouse movement, clicks, system shortcuts |
| Volume Keys | keyboard | Low-level volume key simulation |
| Math / Arrays | NumPy | Coordinate interpolation, clipping |
| Platform | Windows 10/11 | OS target (uses Win32 keyboard/volume APIs) |
gesture-control/
│
├── gesture_control.py ← Single-file application — all logic lives here
├── requirements.txt ← Python package dependencies (pip-installable)
├── START.bat ← One-click Windows launcher (auto-installs deps)
├── ins.txt ← Quick manual install reference
└── README.md ← Documentation
gesture_control.py is organized into clearly commented sections:
| Section | Contents |
|---|---|
| Config | CAM_W/H, SMOOTH, PINCH_PX, SWIPE_VEL, CLICK_CD, FIST_HOLD, VOL_TOP/BOT, MARGIN |
| Colors | BGR color constants for the HUD (CYAN, GREEN, RED, YELLOW, PINK, DIM, WHITE, DARK) |
| State | Global variables tracking pause state, smooth cursor position, cooldown timers, volume level |
| Helpers | dist() — Euclidean distance; push_log() — action log; get_gesture() — gesture classifier |
| Actions | action_mouse, action_click, action_rclick, action_volume, action_swipe, action_win_d, action_win_tab, action_fist |
| HUD | draw_hud() — full overlay renderer (gesture badge, FPS, volume bar, swipe label, fist progress, action log, cheat sheet) |
| Main loop | Camera init, frame capture loop, hand detection, gesture dispatch, display |
| Requirement | Details |
|---|---|
| OS | Windows 10 or Windows 11 |
| Python | 3.11.x — MediaPipe 0.10.14 does not support Python 3.12+ |
| Webcam | Any standard USB or built-in webcam |
-
Install Python 3.11 from python.org
⚠️ During setup, check "Add Python to PATH" -
Double-click
START.bat
START.bat automatically:
- ✅ Verifies Python 3.11 is available
- 📦 Installs all dependencies on first run (~2 minutes)
- 🚀 Launches the application
# 1. Clone the repository
git clone https://github.com/engrmaziz/gesture-control.git
cd gesture-control
# 2. Install dependencies using Python 3.11
py -3.11 -m pip install cvzone mediapipe==0.10.14 opencv-python pyautogui numpy keyboard
# 3. Run
py -3.11 gesture_control.pyNote:
cvzoneis installed explicitly before-r requirements.txtbecause it is not listed inrequirements.txt. The command below handles both in one step.
py -3.11 -m pip install cvzone -r requirements.txt
py -3.11 gesture_control.py| Gesture | Hand Shape | Action | Trigger |
|---|---|---|---|
| ☝️ POINT | Index finger only | Move mouse cursor | Continuous |
| 🤏 PINCH | Pinch thumb + index | Left click | Cooldown: 0.45s |
| ✌️ PEACE | Index + Middle fingers | Right click | Cooldown: 0.45s |
| 🖐️ OPEN | All fingers extended | Volume control (height = level) | Continuous |
| 👋 SWIPE | Open hand, horizontal motion | Alt+Tab / Alt+Shift+Tab | Cooldown: 1.0s |
| 🤟 THREE | Index + Middle + Ring | Win+D (show/hide desktop) | On gesture change |
| 🤘 ROCK | Index + Pinky | Win+Tab (Task View) | On gesture change |
| ✊ FIST | All fingers closed, hold 1.5s | Pause / Resume all controls | Hold 1.5s |
All tunable constants are at the top of gesture_control.py — no config files needed:
| Variable | Default | Description |
|---|---|---|
CAM_W, CAM_H |
960, 540 |
Camera capture resolution |
SMOOTH |
5 |
Mouse smoothing factor — higher = smoother but slightly more latency |
PINCH_PX |
45 |
Pixel distance threshold between thumb and index tip to trigger a click |
SWIPE_VEL |
0.025 |
Minimum normalized wrist velocity to trigger Alt+Tab |
SWIPE_CD |
1.0 |
Seconds between consecutive swipe actions |
CLICK_CD |
0.45 |
Seconds between consecutive click actions |
FIST_HOLD |
1.5 |
Seconds to hold a fist before toggling pause |
VOL_TOP |
0.15 |
Normalized Y position (top of frame) = 100% volume |
VOL_BOT |
0.85 |
Normalized Y position (bottom of frame) = 0% volume |
MARGIN |
100 |
Pixel margin for mapping hand position to screen edges |
- 💡 Good lighting dramatically improves landmark detection accuracy
- 📏 Position your hand 30–60 cm from the camera
- 🎨 A plain or neutral background helps MediaPipe track reliably
- 🖱️ If the mouse feels jittery, increase
SMOOTHfrom5→8or10 - 🖱️ If clicks fire too fast, increase
CLICK_CD(e.g.,0.6) - ⏸️ Hold a FIST for 1.5 seconds any time to pause/resume all controls
- ❌ Press Q in the camera window to quit the application
| Package | Version | Purpose |
|---|---|---|
cvzone |
latest | High-level MediaPipe hand tracking wrapper |
mediapipe |
0.10.14 |
Google's BlazePalm + Hand Landmark model |
opencv-python |
latest | Webcam capture, image processing, HUD rendering |
pyautogui |
latest | Mouse movement, clicks, and keyboard shortcuts |
keyboard |
latest | System-level volume key simulation |
numpy |
latest | Coordinate interpolation, normalization, clipping |
All packages are free, open source, and run 100% offline — no data ever leaves your machine.
# 1. Install Python 3.11 from https://python.org
# 2. Clone and install
git clone https://github.com/engrmaziz/gesture-control.git
cd gesture-control
py -3.11 -m pip install cvzone mediapipe==0.10.14 opencv-python pyautogui numpy keyboard
# 3. Run
py -3.11 gesture_control.pyPoint your hand at the webcam and start gesturing. Press Q to quit.
Contributions are welcome! Feel free to open an issue or submit a pull request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-gesture) - Commit your changes (
git commit -m 'Add amazing gesture') - Push to the branch (
git push origin feature/amazing-gesture) - Open a Pull Request
Built with cvzone · MediaPipe · OpenCV · PyAutoGUI · keyboard · $0
⭐ Star this repo if you found it useful!