Automatic synchronization system for Markdown notes to Qdrant vector database for RAG in Cursor.
This system automatically syncs your Markdown notes from Notes/ folder to Qdrant's cursor-knowledge collection, making them searchable via Cursor's MCP integration.
Notes/ β Markdown files (Obsidian compatible)
β (hourly scan)
sync-notes-to-qdrant.py β Parse, chunk, embed with FastEmbed
β
Qdrant (cursor-knowledge) β Vector storage
β
MCP-Qdrant-Knowledge (port 8001) β Read-only access for Cursor
β
Cursor IDE β Semantic search in your notes
cd /home/flowtech/FlowTech-LAB/FlowTech-AI
./scripts/sync-notes.shExpected output:
============================================================
Starting notes sync: ./Notes β cursor-knowledge
============================================================
Found 7 markdown files
Creating: VMs/VM-Example-WebServer.md
β
Synced 3 chunks for VMs/VM-Example-WebServer.md
...
============================================================
Sync completed!
Scanned: 7
Created: 7
Updated: 0
Deleted: 0
Unchanged: 0
Errors: 0
============================================================
cd /home/flowtech/FlowTech-LAB/FlowTech-AI
./scripts/install-cron.shThis will run the sync every hour automatically.
The system maintains a cache file (AI_Data/notes-sync-cache.json) that tracks:
- File hash (SHA256)
- Last modification time
- Number of chunks
- Last sync timestamp
On each run:
- Scan
Notes/**/*.mdfiles - Compare file hash with cache
- If changed β re-sync
- If unchanged β skip
CHUNK_SIZE = 800 # characters per chunk
CHUNK_OVERLAP = 100 # overlap for context- Splits at sentence boundaries when possible
- Maintains context with overlap
- Each chunk tagged with file path and index
- Model:
BAAI/bge-large-en-v1.5(same as MCP-Qdrant) - Dimensions: 1024
- Provider: FastEmbed (local, no API calls)
For modified files:
- Delete all existing chunks (filtered by
file_path) - Re-chunk the new content
- Re-embed all chunks
- Insert with deterministic UUIDs
For deleted files:
- Detected by comparing cache vs filesystem
- All associated chunks removed from Qdrant
{
"document": "Chunk content here...",
"metadata": {
"file_path": "VMs/VM-Example-WebServer.md",
"file_hash": "abc123...",
"chunk_index": 0,
"chunk_total": 3,
"last_synced": "2025-10-19T23:24:14.731Z",
"frontmatter": {
"vm_name": "WebServer",
"ip": "192.168.1.100",
"status": "active"
}
}
}{
"VMs/VM-Example-WebServer.md": {
"hash": "sha256_hex_digest",
"mtime": 1697000000.123,
"chunks": 3,
"last_synced": "2025-10-19T23:24:14.731Z"
}
}# Qdrant connection
QDRANT_URL=http://localhost:6333
COLLECTION_NAME=cursor-knowledge
# Embedding model
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
# Paths
NOTES_PATH=./Notes
CACHE_FILE=./AI_Data/notes-sync-cache.json
# Chunking
CHUNK_SIZE=800
CHUNK_OVERLAP=100Edit scripts/sync-notes.sh to change defaults:
export CHUNK_SIZE="1000"
export CHUNK_OVERLAP="150"You have two separate collections:
-
cursor-context (port 8000) - Code context
- Used by
@qdranttool - Read/Write access
- For code snippets, documentation
- Used by
-
cursor-knowledge (port 8001) - Personal notes
- Used by
@qdrant-knowledgetool - Read-only in Cursor (write via sync script)
- For your Markdown notes
- Used by
User: @qdrant-knowledge find information about my web server VM
Cursor: [searches in cursor-knowledge collection]
User: What's the IP of my database server?
Cursor: [can search cursor-knowledge for server configurations]
Note: Currently both MCP servers expose the same tool names, so Cursor might default to the first one. You can specify which collection to search by using the correct MCP server in your queries.
# Run sync manually
cd /home/flowtech/FlowTech-LAB/FlowTech-AI
./scripts/sync-notes.sh
# Check results
curl -s "http://localhost:6333/collections/cursor-knowledge" | jq '.result.points_count'# List all synced files
curl -s -X POST "http://localhost:6333/collections/cursor-knowledge/points/scroll" \
-H "Content-Type: application/json" \
-d '{"limit": 100, "with_payload": true, "with_vector": false}' \
| jq -r '.result.points[].payload.metadata.file_path' | sort -u# Cron job logs
tail -f logs/notes-sync.log
# Manual run (verbose)
cd /home/flowtech/FlowTech-LAB/FlowTech-AI
NOTES_PATH=./Notes ./scripts/venv/bin/python ./scripts/sync-notes-to-qdrant.py# Delete cache
rm AI_Data/notes-sync-cache.json
# Run sync
./scripts/sync-notes.sh# Edit crontab
crontab -e
# Change from hourly to every 30 minutes:
# */30 * * * * /path/to/sync-notes.sh >> /path/to/logs/notes-sync.log 2>&1crontab -l | grep -v "sync-notes.sh" | crontab -# Check if cursor-knowledge collection exists
curl -s "http://localhost:6333/collections/cursor-knowledge"
# If not, run init.sh to create it
cd /home/flowtech/FlowTech-LAB/FlowTech-AI
./init.sh# Fix permissions
chmod 644 AI_Data/notes-sync-cache.json
chown $USER:$USER AI_Data/notes-sync-cache.json# Check cron service
systemctl status cron
# Check crontab
crontab -l
# Check logs
tail -f logs/notes-sync.logThe FastEmbed model uses ~1-2GB RAM. If this is a problem:
- Reduce
CHUNK_SIZEto process fewer chunks at once - Consider using a smaller embedding model
- MCP-Qdrant Guide - Cursor integration
- Notes Templates - Obsidian templates
- Main README - FlowTech-AI overview
Last Updated: 2025-10-19
Status: β
Production Ready