AI-Powered GitHub Profile Analysis for Technical Recruiting
FolioHive is a cloud-native SaaS platform that automatically analyzes candidate GitHub profiles to help recruiters assess technical skills, coding style, and project experience. The system aggregates repository metadata, caches relevant code artifacts, and uses AI to generate contextual summaries and answer recruiter queries.
- Python 3.12+
- Node.js 18+ and npm
- Azure Functions Core Tools v4
- Azurite (Azure Storage Emulator)
- GitHub Personal Access Token (rate limit: 5000 requests/hour)
- OpenAI API key
# Start all development services (Azurite + API + UI)
./run-dev-session.sh --run-e2e -- --python-version 3.12+ --run-tests
# Local settings;
Ensure to update `local.settings.json` with your Github and OpenAI API key and any necessary configuration for Azure Storage connection strings and CORS.Access the application:
- UI: http://localhost:4200
- API: http://localhost:7071
- Azurite: http://localhost:10002 (Table), 10000 (Blob), 10001 (Queue)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Angular UI (SWA) β
β Landing | Profile | Projects | AI Assistant | Admin Dashboard β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β HTTP/REST
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ
β Azure Functions (Flex Consumption) β
β ββββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬βββββββββββββββ β
β β API Gateway β Sync Worker βCache Worker βReconciliationβ β
β β (HTTP Routes)β (Queue) β (Queue) β (Timer) β β
β ββββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄βββββββββββββββ β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ
β Azure Storage Account β
β βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββββββββ β
β βTable Storageβ Blob Storage β Queue Storage β β
β β(7 Tables) β(Cached Files)β(sync-jobs, cache-jobs) β β
β βββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ
β External Services β
β GitHub REST/GraphQL API | OpenAI GPT API β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Function App with Blueprint Pattern (Modular Monolith)
- API Gateway: HTTP endpoints for sync, job polling, AI summaries, queries
- Sync Worker: Queue-triggered; fetches GitHub metadata, generates fingerprints
- Cache Worker: Queue-triggered; fetches file contents (README, configs)
- Reconciliation Worker: Timer-triggered; cleanup and retry logic (3-min interval)
Shared Modules (foliohive_shared/)
- ai/: OpenAI integration, context orchestration, token management
- cache/: Fingerprint-based caching, blob storage management
- github/: REST + GraphQL unified interface
- table/: 8-table normalized schema with TableManager
- queue/: Message serialization and queue clients
Data Storage
- Table Storage: 8 normalized tables (JobMetadata, RepoGitHubMetadata, RepoLanguages, etc.)
- Blob Storage: Cached README and config files (content-addressable by fingerprint)
- Queue Storage: Async job processing (sync-jobs, cache-jobs)
- Fetch candidate profiles via GitHub username
- Collect repository metadata (languages, stars, topics, dates)
- Track sync state with job progress monitoring
- Deduplicate requests using fingerprint-based caching
- Automatically fetch README files for project context
- Cache language-specific config files (package.json, pyproject.toml, etc.)
- Content-addressable storage (SHA-256 fingerprints)
- Skip unchanged files to minimize API calls
- Profile Summary: Holistic candidate overview with skills, experience, patterns
- Repository Summary: Individual project analysis with tech stack and architecture
- Interactive Assistant: Answer recruiter queries with candidate-specific context
- Queue-driven architecture for scalability
- Job state tracking: queued β syncing β metadata_ready β completed
- Repo state tracking: pending β synced β cached
- Automatic retry with reconciliation worker
- Tiered model selection (gpt-5-nano, gpt-4o-mini)
- Token budget management per summary type
- Context chunking to fit within limits
- Response validation and truncation detection
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Angular 18+ (Standalone Components) | Reactive UI with RxJS |
| Backend | Azure Functions (Flex Consumption) | Serverless API + Workers |
| Language | Python 3.12+ | Core backend logic |
| AI | OpenAI GPT (gpt-5-nano, gpt-4o-mini) | Summaries and queries |
| Storage | Azure Table, Blob, Queue Storage | Data persistence |
| Git Data | GitHub REST + GraphQL API | Repository metadata |
| IaC | Bicep | Infrastructure as Code |
| CI/CD | Azure DevOps Pipelines | Automated deployments |
| Monitoring | Application Insights | Telemetry and diagnostics |
foliohive/
βββ api/v0.4.0/
β βββ function-app/ # Azure Functions entry point
β β βββ function_app.py # Main app registration
β β βββ blueprints/ # Worker implementations
β βββ shared/ # foliohive_shared package
β β βββ src/foliohive_shared/
β β βββ ai/ # AI integration
β β βββ cache/ # Caching logic
β β βββ github/ # GitHub API client
β β βββ queue/ # Queue messaging
β β βββ table/ # Table Storage schema
β βββ tests/ # Pytest test suite
β
βββ ui/ # Angular Static Web App
β βββ src/app/
β βββ landing/ # Candidate search
β βββ profile/ # Candidate summary
β βββ projects/ # Repository list
β βββ ai/ # AI assistant
β βββ services/ # API clients
β
βββ infra/bicep/ # Azure infrastructure
β βββ main.bicep # Entry point
β βββ main.bicepparam # Parameters
β βββ modules/ # Resource modules
β
βββ README.md # This file
- API Documentation - Blueprints, workers, shared modules, table schema
- UI Documentation - Components, services, state management
- Infrastructure Documentation - Bicep modules, deployment, networking
- DevOps Documentation - Pipelines, CI/CD, variable groups (coming soon)
# Run all tests
cd api/v0.4.0/tests
./run_tests.sh
# Run specific test suite
pytest test_reconciliation_worker.py -v
# Run integration tests
pytest integration/ -v
# Run E2E curl tests
./e2e_curl_tests.shcd infra/bicep
# Deploy with default parameters
az deployment sub create \
--name foliohive-prod \
--location eastus \
--template-file main.bicep \
--parameters main.bicepparam
# Or use specific parameter file
az deployment sub create \
--location eastus \
--template-file main.bicep \
--parameters @main.bicepparamAutomated via Azure DevOps pipelines:
- Functions:
azure-functions-cd.yml - Static Web App:
static-web-app-cd.yml - Training Worker:
training-worker-cd.yml
- Managed Identity: No stored credentials for Azure service communication
- Private Networking: VNet integration for Function Apps
- Key Vault: Secrets management (GitHub tokens, OpenAI keys)
- CORS: Restricted to UI origin
- API Keys: Optional authentication layer
- Flex Consumption Plan: Pay only for execution time
- Intelligent Caching: Minimize GitHub API calls with fingerprints
- AI Token Management: Budget enforcement per summary type
- Queue-Based Processing: Efficient async execution
- Storage Lifecycle: TTL-based blob cleanup (planned)
- Application Insights: End-to-end telemetry
- Custom Metrics: Job success rates, cache hit ratios, AI token usage
- Structured Logging: Correlation IDs across workers
- Queue Metrics: Message depth, processing time, DLQ counts
- Create feature branch:
git checkout -b feature/your-feature - Follow Python coding standards (PEP 8)
- Add tests for new functionality
- Update relevant documentation
- Submit pull request with clear description
Proprietary - All rights reserved
- Issues: Submit via GitHub Issues
- Documentation: Check component-specific READMEs
- Architecture Questions: Review Architecture Decision Records (coming soon)
Built with β€οΈ for technical recruiters