Skip to content

Latest commit

 

History

History
295 lines (215 loc) · 6.37 KB

File metadata and controls

295 lines (215 loc) · 6.37 KB

Contributing to IntelliTag

Thank you for your interest in contributing to IntelliTag! This document provides guidelines and instructions for contributing.

Table of Contents

Code of Conduct

This project follows the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code.

Getting Started

  1. Fork the repository on GitHub
  2. Clone your fork locally:
    git clone https://github.com/YOUR_USERNAME/Classifier_Questions_StackOverflow.git
    cd Classifier_Questions_StackOverflow
  3. Add the upstream remote:
    git remote add upstream https://github.com/ThomasMeb/Classifier_Questions_StackOverflow.git

Development Setup

Prerequisites

  • Python 3.9 or higher
  • pip
  • Git

Installation

  1. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  2. Install dependencies:

    make install-dev
    # Or manually:
    pip install -e ".[dev]"
    pip install -r requirements-dev.txt
  3. Verify installation:

    make test

Making Changes

Branch Naming

Use descriptive branch names:

  • feature/add-new-extractor - New features
  • fix/api-response-format - Bug fixes
  • docs/update-readme - Documentation
  • refactor/simplify-classifier - Code refactoring

Workflow

  1. Create a new branch from main:

    git checkout main
    git pull upstream main
    git checkout -b feature/your-feature-name
  2. Make your changes following our code standards

  3. Test your changes:

    make test
    make lint
  4. Commit your changes using conventional commits

  5. Push to your fork:

    git push origin feature/your-feature-name
  6. Open a Pull Request on GitHub

Code Standards

Python Style Guide

We follow PEP 8 with these tools:

  • Black for code formatting (line length: 88)
  • isort for import sorting
  • Flake8 for linting
  • mypy for type checking

Run all checks:

make lint

Auto-format code:

make format

Docstrings

Use Google-style docstrings:

def predict_tags(text: str, top_k: int = 5) -> list[tuple[str, float]]:
    """Predict tags for the given text.

    Args:
        text: The input text (title + body).
        top_k: Maximum number of tags to return.

    Returns:
        List of (tag, confidence) tuples sorted by confidence.

    Raises:
        ValueError: If text is empty or top_k < 1.

    Example:
        >>> predict_tags("How to parse JSON in Python?", top_k=3)
        [('python', 0.95), ('json', 0.87), ('parsing', 0.45)]
    """

Type Hints

All functions must have type hints:

from typing import Optional

def process_text(
    text: str,
    lowercase: bool = True,
    remove_code: Optional[bool] = None
) -> str:
    ...

File Organization

src/intellitag/
├── config/          # Configuration and settings
├── data/            # Data loading and preprocessing
├── features/        # Feature extraction
├── models/          # ML models
├── api/             # REST API
└── utils/           # Utilities

Testing

Running Tests

# All tests
make test

# With coverage
make test-cov

# Specific test file
pytest tests/unit/test_preprocessor.py

# Specific test
pytest tests/unit/test_preprocessor.py::test_clean_html -v

Writing Tests

  • Place unit tests in tests/unit/
  • Place integration tests in tests/integration/
  • Use descriptive test names: test_<what>_<condition>_<expected>

Example:

def test_predict_tags_with_empty_text_raises_value_error():
    """Test that empty text raises ValueError."""
    classifier = TagClassifier()
    with pytest.raises(ValueError, match="Text cannot be empty"):
        classifier.predict("")

Test Coverage

Aim for >80% code coverage. Check coverage:

make test-cov
# Open htmlcov/index.html in browser

Submitting Changes

Commit Messages

Follow Conventional Commits:

<type>(<scope>): <description>

[optional body]

[optional footer]

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation only
  • style: Code style (formatting, no code change)
  • refactor: Code refactoring
  • test: Adding or updating tests
  • chore: Maintenance tasks

Examples:

feat(api): add batch prediction endpoint

fix(preprocessor): handle Unicode characters in code blocks

docs(readme): update installation instructions

Pull Request Guidelines

  1. Title: Use conventional commit format
  2. Description: Include:
    • What changes were made
    • Why the changes were necessary
    • How to test the changes
  3. Link issues: Reference related issues with Fixes #123
  4. Keep PRs focused: One feature/fix per PR
  5. Update documentation: If behavior changes

Review Process

  1. All PRs require at least one approval
  2. CI checks must pass (lint, tests, build)
  3. Resolve all review comments
  4. Squash commits if requested

Issue Guidelines

Bug Reports

Include:

  • Python version
  • OS and version
  • Steps to reproduce
  • Expected vs actual behavior
  • Error messages/stack traces

Feature Requests

Include:

  • Use case description
  • Proposed solution (if any)
  • Alternatives considered

Labels

  • bug - Something isn't working
  • enhancement - New feature request
  • documentation - Documentation improvements
  • good first issue - Good for newcomers
  • help wanted - Extra attention needed

Questions?

  • Open an issue for questions
  • Check existing issues and discussions first

Thank you for contributing!