First off, thank you for considering contributing to PipeFrame! It's people like you that make PipeFrame such a great tool for the data science community.
Found a bug? Please open an issue with:
- Clear, descriptive title
- Steps to reproduce
- Expected vs actual behavior
- Code samples
- Environment details (Python version, OS, pipeframe version)
Have an idea? We'd love to hear it! Include:
- Use case description
- Proposed API (how it would work)
- Example code showing the feature
- Why it would be useful
Help others learn PipeFrame:
- Fix typos or clarify explanations
- Add examples
- Create tutorials
- Improve docstrings
Ready to code? Awesome! See development setup below.
# Fork the repository on GitHub, then:
git clone https://github.com/YOUR_USERNAME/pipeframe.git
cd pipeframe# Using venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Using conda
conda create -n pipeframe python=3.10
conda activate pipeframe# Install package in editable mode with dev dependencies
pip install -e ".[dev,test]"
# Or install from requirements
pip install -r requirements-dev.txtpre-commit installgit checkout -b feature/your-feature-name
# or
git checkout -b fix/issue-number-descriptionFollow these guidelines:
- Write clear, readable code
- Add docstrings (Google style)
- Include type hints
- Add tests for new features
- Update documentation
# Run all tests
pytest
# Run with coverage
pytest --cov=pipeframe --cov-report=html
# Run specific test
pytest tests/test_dataframe.py::test_filter# Format with black
black pipeframe/
# Sort imports
isort pipeframe/
# Lint
flake8 pipeframe/
# Type check
mypy pipeframe/git add .
git commit -m "feat: add amazing new feature"Commit Message Format:
feat:New featurefix:Bug fixdocs:Documentation changestest:Test additions/changesrefactor:Code refactoringperf:Performance improvementschore:Maintenance tasks
git push origin feature/your-feature-nameThen create a Pull Request on GitHub with:
- Clear description of changes
- Link to related issues
- Screenshots/examples if applicable
- Follow PEP 8
- Use Black for formatting (line length: 100)
- Use isort for import sorting
- Type hints required for public APIs
Use Google style:
def awesome_function(param1: str, param2: int = 0) -> DataFrame:
"""
Brief description of what this does.
More detailed explanation if needed. Can span multiple
lines and include usage notes.
Args:
param1: Description of param1
param2: Description of param2. Defaults to 0.
Returns:
Description of return value
Raises:
ValueError: When param1 is empty
Examples:
>>> result = awesome_function("hello", 42)
>>> print(result)
"""
passfrom typing import Any, List, Optional, Union
from pipeframe.core.dataframe import DataFrame
def process_data(
df: Union[DataFrame, pd.DataFrame],
columns: Optional[List[str]] = None,
**kwargs: Any
) -> DataFrame:
...import pytest
from pipeframe import DataFrame, filter, define
class TestDataFrame:
def test_filter_basic(self):
"""Test basic filtering functionality."""
df = DataFrame({'x': [1, 2, 3, 4]})
result = df >> filter('x > 2')
assert len(result) == 2
def test_filter_empty_result(self):
"""Test filtering that returns no rows."""
df = DataFrame({'x': [1, 2, 3]})
result = df >> filter('x > 10')
assert len(result) == 0
def test_filter_invalid_column(self):
"""Test error handling for invalid column."""
df = DataFrame({'x': [1, 2, 3]})
with pytest.raises(PipeFrameColumnError):
df >> filter('y > 2')- One test file per module
- Test classes for related tests
- Clear test names describing what's tested
- Test both success and error cases
- Test edge cases
- Keep examples simple and focused
- Ensure all code examples actually work
- Update table of contents if adding sections
- Every public function/class needs docstring
- Include parameters, returns, raises
- Add usage examples
- Note any security considerations
- Start simple, build complexity
- Explain the "why" not just "how"
- Include real-world examples
- Test all code cells
- ✅ Tests pass
- ✅ Code is formatted (black, isort)
- ✅ Type hints present
- ✅ Docstrings complete
- ✅ No breaking changes (or clearly documented)
- ✅ Performance impact considered
- ✅ Security implications reviewed
- Initial response: Within 2 days
- Full review: Within 1 week
- Revisions: As needed
We especially welcome contributions in:
-
Performance Optimization
- Profiling and benchmarking
- Vectorization improvements
- Memory efficiency
-
Additional Verbs
- Join operations
- Window functions
- Time series helpers
-
Backend Support
- Polars integration
- DuckDB support
- Arrow format
-
Documentation
- More examples
- Video tutorials
- Translation
-
Testing
- Edge case coverage
- Performance tests
- Integration tests
- GitHub Discussions: Ask questions, share ideas
- Issues: Bug reports, feature requests
- Email: yasser.mustafan@gmail.com
For substantial changes:
- Open an issue first
- Discuss the approach
- Get feedback before coding
- Then submit PR
By contributing, you agree that your contributions will be licensed under the MIT License.
Contributors are recognized in:
- README.md contributors section
- Release notes
- Annual contributor highlight
Don't hesitate to ask! We're here to help:
- Open a discussion on GitHub
- Email: yasser.mustafan@gmail.com
- Tag @Yasser03 in issues
Thank you for making PipeFrame better! 🎉