A CLI tool to archive and export Telegram chat history from channels and groups you are a member of.
- Login as yourself via MTProto (Telethon) - no bot required
- QR code login - scan with your phone, no phone number typing needed
- Incremental sync - only fetches new messages on subsequent runs
- Local SQLite database - messages cached locally, no re-fetching
- Export formats: TXT (with message IDs and reply threading) and JSONL
- Date/ID range filtering - export specific time periods or message ranges
- Rate limit handling - automatic sleep on FloodWait errors
- Graceful shutdown - Ctrl+C saves state safely
Requires Python 3.10+.
# Using uv (recommended)
cd telegramFetcher
uv sync
# Or using pip
pip install -e .You need API credentials to use Telegram's MTProto protocol. This is a one-time setup that takes ~2 minutes:
- Go to https://my.telegram.org/auth
- Log in with your phone number (you'll receive a code in Telegram)
- Click "API development tools"
- Fill the form (all fields can be anything):
- App title:
My Exporter(or whatever you want) - Short name:
myexporter(lowercase, no spaces) - Platform:
Desktop - Description:
Personal use
- App title:
- Click "Create application"
- Copy your
api_id(a number) andapi_hash(a long hex string)
⚠️ Keep these secret! Don't share them or commit to git.
Option A: Use a .env file (recommended)
cp .env.example .env
# Edit .env with your actual credentialsYour .env file should look like:
TGX_API_ID=12345678
TGX_API_HASH=abcdef1234567890abcdef1234567890Option B: Export in terminal
export TGX_API_ID=12345678
export TGX_API_HASH=abcdef1234567890abcdef1234567890Optional settings:
TGX_SESSION=./my_session.session # Custom session file path
TGX_DB=./my_data.sqlite # Custom database pathuv run python -m tgx.main auth-testQR Code Login (default):
- A QR code will be displayed in your terminal
- Open Telegram on your phone → Settings → Devices → Link Desktop Device
- Scan the QR code
- If you have 2FA enabled, enter your password
Phone Login (alternative):
uv run python -m tgx.main auth-test --phoneOn success, you'll see "AUTHORIZED" and your session is saved to tgx.session.
Your login is cached! After the first successful authentication:
- The session is stored in
tgx.sessionfile - Subsequent runs skip login entirely
- To re-authenticate, delete the session file
# List your dialogs
uv run python -m tgx.main dialogs
# Search for specific chats
uv run python -m tgx.main dialogs --search "crypto"# Fetch 5 messages to verify access
uv run python -m tgx.main fetch-test --peer @channelname --limit 5# Export last 100 messages to TXT
uv run python -m tgx.main export --peer @channelname --last 100 --txt out.txt
# Export to both formats
uv run python -m tgx.main export --peer @channelname --last 1000 --txt out.txt --jsonl out.jsonl
# Export by date range
uv run python -m tgx.main export --peer @channelname --start "2025-01-01" --end "2025-01-31" --txt january.txt
# Export by message ID range
uv run python -m tgx.main export --peer @channelname --since-id 5000 --until-id 6000 --jsonl range.jsonl
# Include raw JSON data in JSONL
uv run python -m tgx.main export --peer @channelname --last 100 --jsonl out.jsonl --include-raw# Sync last 100 messages to local DB
uv run python -m tgx.main sync --peer @channelname --last 100One message per line, optimized for AI/LLM consumption:
[12345] 2025-01-15 10:30:45 | Alice | Hey everyone, check out this news article!
[12346] 2025-01-15 10:31:02 | Bob | [reply to #12345 @Alice] Interesting, thanks for sharing
[12347] 2025-01-15 10:32:15 | channel | [photo]
- Message ID in brackets at the start
[12345] - Reply info shows which message is being replied to and who wrote it
[reply to #12345 @Alice] - Timestamps are in local timezone
- Newlines in messages are flattened to spaces
- Media messages show
[media_type]
One JSON object per line:
{"id": 12345, "peer_id": -1001234567890, "date": "2025-01-15T09:30:45+00:00", "date_utc_ms": 1736934645000, "sender_id": 111, "sender_name": "Alice", "text": "Hello", "reply_to_msg_id": null, "has_media": false, "media_type": null}With --include-raw, the raw_data field contains the full Telethon message object (parsed JSON, not a string). Use --raw-as-string to emit it as a JSON string instead.
You can specify peers in multiple ways:
@username- Public usernamehttps://t.me/username- Telegram link-1001234567890- Peer ID (channels start with -100)
Note: Title-based lookup is not supported. Use
dialogs --search "keyword"to find peer IDs.
A companion tool to compress exported chat logs for efficient AI/LLM processing.
- Message concatenation - Consecutive messages from the same user merged with
. - Media grouping -
[photo] [photo] [photo]→[3 photos] - Link shortening - Full URLs →
[x.com link],[@levelsio tweet],[org/repo repo] - Date headers - Removes timestamps, shows
# 2025-01-15for new days - User renaming - Interactive mode to shorten names (e.g., "Victor Naumik" → "V")
- Sensitive data redaction - API keys, tokens, passwords, mnemonics
- Token counting - Shows estimated AI token count (using tiktoken)
# Basic usage (interactive user renaming)
uv run python tg_packer.py pack export.txt
# Non-interactive with all defaults
uv run python tg_packer.py pack export.txt -n
# With sensitive data redaction
uv run python tg_packer.py pack export.txt -n --redact
# Custom output file
uv run python tg_packer.py pack export.txt -n -o compressed.txt
# Analyze without packing (statistics only)
uv run python tg_packer.py analyze export.txt| Option | Short | Description |
|---|---|---|
--output |
-o |
Output file path (default: input_packed.txt) |
--no-interactive |
-n |
Skip interactive user renaming |
--links |
-l |
Link handling: full, short (default), remove |
--redact |
-s |
Redact sensitive data (API keys, tokens, etc.) |
--keep-replies |
-r |
Keep full reply metadata |
| Mode | Example Input | Example Output |
|---|---|---|
full |
https://x.com/user/status/123 |
(unchanged) |
short |
https://x.com/levelsio/status/123 |
[@levelsio tweet] |
short |
https://github.com/org/repo |
[org/repo repo] |
remove |
Any URL | [link] or [3 links] |
Automatically redacts:
- JWT tokens, API keys (OpenAI, Google, etc.)
- Ethereum private keys and long hex strings
- Mnemonic seed phrases
- Environment variable secrets (PASSWORD, TOKEN, SECRET, etc.)
- Database URLs, Supabase/ngrok URLs
- Any alphanumeric string 16+ chars after
=
Before:
[12345] 2025-01-01 10:30:00 | Alice | [video]
[12346] 2025-01-01 10:31:00 | Bob | [reply to #12345 @Alice] Nice video!
[12347] 2025-01-01 10:31:30 | Bob | Check out this link
[12348] 2025-01-01 10:31:45 | Bob | https://x.com/someone/status/123456
After (with user renames A/B):
# 2025-01-01
A: [video]
B: Nice video!. Check out this link. [@someone tweet]
Typical compression: 55-60% token reduction
- The
.sessionfile contains your Telegram login. Keep it private! - Add
*.sessionto your.gitignore - The session file allows full access to your Telegram account
- If compromised, revoke all sessions at https://my.telegram.org
- No forum topics/threads - Only main chat history
- No media downloads - Only
has_mediaflag andmedia_typestored - No reactions - Not captured
- No edit history - Only original message stored (append-only)
- No deletions - Deleted messages remain in local DB
tgx.session- Telegram session (encrypted credentials)tgx.sqlite- Local message database (WAL mode)
- Make sure you've joined the channel/group
- Try using the peer_id instead of username
- The channel is private or you were kicked/banned
- The tool handles this automatically
- For large syncs (>5000 messages), expect occasional waits
- Delete the
.sessionfile and re-authenticate
# Install dev dependencies
uv sync
# Run directly
uv run python -m tgx.main --helpMIT