Skip to content

ktbteam/ktb-studio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KTB Studio

AI video toolkit — 5 tools, 1 codebase. Shared engine for text-to-speech, scene planning, rendering, and social posting.

License: MIT Python 3.11+

⚠️ Alpha. APIs may break between minor versions while we stabilise. The VCM local renderer is the most polished entry point — start there.


What you can build

Tool Input Output Best for
🎬 VCM (VideoContentMaker) A topic sentence 1-15 min MP4 (9:16 or 16:9) with AI narration + images YouTube long-form, TikTok shorts, batch content
📰 AINews URL or article text 60-90 s vertical short with AI summary TikTok / Reels news rewrites
📺 NewsEditor Multi-source URLs Neutral multi-angle news video with citations Balanced commentary, factual recaps
📚 LingoFeeder Language config Vocab / grammar / dialogue lesson shorts Language teaching channels
🎤 ViralEditor Upload + summary text Re-edit with cautionary voiceover Reaction-style commentary

All five share one engine: Gemini for scene planning, Google Chirp 3 HD / Edge / gTTS for voice, HyperFrames (HTML→MP4) for rendering, Cloudflare Workers AI / Pollinations / Pexels for images, and one-click posting to Facebook Page + TikTok.


Try it locally — render your first video in ~5 minutes

The fastest way to see what this toolkit does is the VCM local 1-click renderer. No Telegram bot, no server — type a topic, click a button, get an MP4.

Prerequisites

Dep Where
Python 3.12+ (with "Add to PATH" checked on Windows) https://www.python.org/downloads/
Node.js 20+ https://nodejs.org/en/download/
ffmpeg in PATH choco install ffmpeg (Windows) · brew install ffmpeg (Mac) · apt install ffmpeg (Linux)
A FREE Gemini API key https://aistudio.google.com/apikey

Steps

# 1. Clone
git clone https://github.com/ktbteam/ktb-studio
cd ktb-studio

# 2. Setup (creates venv, installs deps, creates .env + channels/local/.env)
local\SETUP.bat            # Windows
./local/SETUP.sh           # Mac / Linux

# 3. Paste your keys into the two .env files SETUP just created.
#    Minimum required: GEMINI_API_KEY in .env at repo root.
#    Strongly recommended for nicer voice: GOOGLE_TTS_API_KEY + GOOGLE_TTS_VOICE
#    in channels/local/.env (Google Cloud TTS Chirp 3 HD has a free tier).

# 4. Render
local\RUN_VCM.bat          # Windows
./local/RUN_VCM.sh         # Mac / Linux

The console asks for topic → format (9:16 vs 16:9) → image source (AI cartoon vs Pexels real photo), then runs the pipeline. The output folder opens automatically when the MP4 is ready.

Full local renderer docs, troubleshooting, and the "which key goes where" reference: local/README.md


API keys cheat-sheet

Different keys live in different places. The local launcher reads them via python-dotenv, the bots read them via systemd EnvironmentFile.

Root .env — image / script generation (no real defaults — paste your own)

GEMINI_API_KEY=AIza...        # REQUIRED — free at aistudio.google.com/apikey
CF_ACCOUNT_ID=...             # optional — Cloudflare Workers AI free tier (10k Neurons/day)
CF_AI_TOKEN=cfut_...          # paired with above
PEXELS_API_KEY=...            # optional — Pexels stock photos (free 200 req/hour, 20k/month)

→ Step-by-step Cloudflare setup (where to click, which permission to scope, smoke test, quota math): docs/setup-cloudflare-workers-ai.md

channels/<slug>/.env — brand + voice per channel

Copy a template first: cp channels/local.example/.env.example channels/<slug>/.env (or channels/example/.env.example for AINews-style brands).

# Identity
CHANNEL_NAME=Your Brand
THEME_VARIANT=dark            # dark | bright | vivid | corporate

# Voice (this is where the magic of "doesn't sound robotic" happens)
GOOGLE_TTS_API_KEY=AIza...    # Google Cloud TTS — strongly recommended
GOOGLE_TTS_VOICE=vi-VN-Chirp3-HD-Achernar

# Optional — only for the Telegram bots, not the local renderer
FB_PAGE_ID=...                # Facebook Page posting (skip for local)
FB_PAGE_TOKEN=...

Without GOOGLE_TTS_API_KEY, the router falls back to Microsoft Edge TTS, then to gTTS — both work without keys but sound noticeably less natural than Chirp 3 HD.


Use the tools as Telegram bots (advanced)

Each tool can also run as a long-poll Telegram bot for on-demand video generation from your phone. This is how the production deployment runs:

# In a separate .env, set the bot tokens you created via @BotFather:
KTB_VCM_TOKEN=8924300440:AAE...
KTB_NEWS_EDITOR_TOKEN=...
KTB_VIRAL_EDITOR_TOKEN=...
TELEGRAM_BOT_TOKEN=...         # AINews uses the base name

# Then launch the bot for the tool you want:
ktb-vcm-bot
ktb-ainews
ktb-news-editor-bot
ktb-viral-editor

systemd unit files for all bots are in deploy/systemd/. Each one expects an EnvironmentFile=/your/path/.env. Don't reuse production bot tokens across local + server — Telegram polling is single-instance and the newer client steals updates from the older one.


Architecture

core/        ← shared engine
  ├── tts/       Google Chirp 3 HD → ElevenLabs → Edge TTS → gTTS router
  ├── planner/   Gemini scene planning + voice gender policy
  ├── renderer/  HyperFrames HTML→MP4 + 4 theme variants (dark/bright/vivid/corporate)
  ├── images/    Cloudflare Workers AI → Pollinations → Pexels chain
  ├── sources/   URL extract / GitHub README / multi-source / vision
  ├── poster/    Facebook Page + TikTok upload
  └── channel.py Brand config loader (per-channel .env → ChannelConfig)

tools/       ← thin tool wrappers, each independent runtime
  ├── ainews/        URL → MP4, Telegram bot + style picker (4 themes)
  ├── vcm/           Topic → MP4, batch CLI + Telegram bot
  ├── news_editor/   Multi-source → MP4, daemon + Telegram bot
  ├── lingo_feeder/  Language drills → MP4
  └── viral_editor/  Upload → re-edit, Telegram bot

channels/    ← brand presets
  ├── example/        ship template for new channels
  └── local.example/  ship template for local renderer
  (real channels — ainews/khuetran/news/local — are GITIGNORED, never pushed)

deploy/      ← systemd unit files + sync scripts (anh's VPS workflow)
local/       ← 1-click Windows/Mac/Linux renderer for VCM
docs/        ← per-tool migration notes (mostly internal)

Rule: core/ never imports from tools/. Tools never import from each other.


Quality + fallback chains

The whole codebase is paranoid about provider failures because the user pressing "render" should get a video, not a stack trace. Every external dependency has a graceful fallback:

  • TTS: Google Chirp 3 HD → ElevenLabs → Edge TTS → gTTS (in router)
  • Images: Cloudflare Workers AI → Pollinations → Pexels (with auto-keyword extraction from cartoon prompts) → gradient placeholder
  • Vision: Gemini Flash → Flash Lite → returns hint string
  • Caption: FB caption / TikTok hashtags / first comment are all built from the same plan

When a provider returns 429 / 402 / 403, the router logs the failure once and moves on — no user-visible error.


Stack

  • Python 3.12+ · Pydantic v2 · asyncio
  • TTS: Google Cloud TTS · Microsoft Edge TTS · ElevenLabs · gTTS
  • Scene planner: Google Gemini 2.5 Flash / Flash Lite (auto-routes long-form to bigger model)
  • Renderer: HyperFrames 0.6.52 (headless Chrome screenshot + ffmpeg encode)
  • Image gen: Cloudflare Workers AI (flux-schnell) · Pollinations · Pexels stock
  • Posting: Facebook Graph API · TikTok upload
  • Per-video cost on the free tier: $0 (Google free quotas + Edge + Pexels free).

Built with vibe coding

This repo is built primarily with Claude Code (Anthropic) and the Claude Agent SDK. Source is open so you can study, fork, or hire the team for custom builds.


License

MIT — Free to use, modify, sell. See LICENSE.

About

AI video toolkit - 5 tools, 1 codebase. Shared engine for TTS, planning, rendering, posting.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors