git clone https://github.com/your-username/CodeRAG.git
cd CodeRAG
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
> The requirements file delegates to `-e .[dev]`, so you can also run
> `pip install -e .[dev]` directly if you prefer editable installs.pip install pre-commit
pre-commit install
pre-commit run --all-filesThis will run code quality checks on every commit:
- Black: Code formatting
- isort: Import sorting
- Flake8: Linting and style checks
- MyPy: Type checking
- Basic hooks: Trailing whitespace, file endings, etc.
Copy example.env to .env and configure:
cp example.env .envRequired variables:
OPENAI_API_KEY=your_key_here # Required for embeddings and chat
WATCHED_DIR=/path/to/code # Directory to index (default: current dir)All functions should have type hints:
def process_file(filepath: str, content: str) -> Optional[np.ndarray]:
\"\"\"Process a file and return embeddings.\"\"\"
...Use structured logging and proper exception handling:
import logging
logger = logging.getLogger(__name__)
try:
result = risky_operation()
except SpecificError as e:
logger.error(f"Operation failed: {str(e)}")
return NoneUse concise docstrings for public functions:
def search_code(query: str, k: int = 5) -> List[Dict[str, Any]]:
\"\"\"Search the FAISS index using a text query.
Args:
query: The search query text
k: Number of results to return
Returns:
List of search results with metadata
\"\"\"# Test backend indexing
python main.py
# Test Streamlit UI (separate terminal)
streamlit run app.pypre-commit run --all-filesIf you need to run a specific tool locally:
black .
isort .
flake8 .
mypy .- Create feature branch:
git checkout -b feature/new-feature - Add logging: Use the logger for all operations
- Add type hints: Follow existing patterns
- Handle errors: Graceful degradation and user-friendly messages
- Update tests: Add tests for new functionality
- Update docs: Update README if needed
- Maintain the single-responsibility principle
- Avoid unnecessary abstractions
- Focus on the core RAG functionality
- Log errors with context
- Return None/empty lists for failures
- Show user-friendly messages in UI
- Don't crash the application
- Limit search results (default: 5)
- Truncate long content for context
- Cache embeddings when possible
- Monitor memory usage with large codebases
logging.basicConfig(level=logging.DEBUG)from coderag.index import inspect_metadata
inspect_metadata(5) # Show first 5 entriesfrom coderag.embeddings import generate_embeddings
result = generate_embeddings("test code")
print(f"Shape: {result.shape if result is not None else 'None'}")Import Errors
- Ensure you're in the virtual environment
- Check PYTHONPATH includes project root
- Verify all dependencies are installed
OpenAI API Issues
- Check API key validity
- Monitor rate limits and usage
- Test with a simple embedding request
FAISS Index Corruption
- Delete existing index files and rebuild
- Check file permissions
- Ensure consistent embedding dimensions
- Regenerate the FAISS index after large code refactors:
python scripts/initialize_index.py. - Rotate environment secrets by updating
.envor your deployment variables, then restarting services. - Refresh dependencies with
pip install --upgrade -r requirements.txtand runpre-commit run --all-filespluspytest -q. - Keep hooks current using
pre-commit autoupdatefollowed by a commit once checks pass.
CodeRAG/
├── coderag/ # Core library
│ ├── __init__.py
│ ├── config.py # Configuration management
│ ├── embeddings.py # OpenAI integration
│ ├── index.py # FAISS operations
│ ├── search.py # Search functionality
│ └── monitor.py # File monitoring
├── scripts/ # Utility scripts
├── tests/ # Test files
├── .github/ # GitHub workflows
├── main.py # Backend service
├── app.py # Streamlit frontend
├── prompt_flow.py # RAG orchestration
└── requirements.txt # Dependencies