When I started this project, I didn't want to just build another "wrapper" for an LLM. I wanted to create a Resilient, State-Aware Recruitment Ecosystem. The goal was simple but ambitious: build an agent that can find a candidate in a messy inbox, remember their history, vet their code in a safe environment, and mathematically grade their technical depth—all while surviving the "noisy" reality of the internet.
Communication is nothing without context. I used the Gmail API (OAuth2) to build a system that doesn't just send emails—it understands threading. By managing In-Reply-To and References headers, I ensured that every AI-generated message feels like a natural continuation of a human conversation.
- The Struggle: Getting the OAuth flow to work across different folders was a nightmare until I implemented absolute pathing for the
credentials.json. It’s one of those "small" things that can break an entire pipeline.
An agent that forgets is just a chatbot. I built a SQLite3 persistence layer that tracks every candidate's unique "Round." Whether they are at "Initial Outreach" or "Final Technical Vetting," the agent knows exactly where they stand, even if the script restarts.
This is the nervous system. Using Gemini 3 Flash and asyncio, I built a central orchestrator that polls for replies.
- The "Spam" Lesson: Live testing showed the agent trying to recruit Pinterest and Instagram digests. I implemented a Gatekeeper Filter to ignore bot addresses, saving our API quota and our reputation.
- The Python Hack: Since Python hates importing from folders starting with numbers (like
01_), I used Priority Path Injection (sys.path.insert) to make the components talk to each other. It’s not the prettiest solution, but it’s effective.
This is where the math happens. To grade long-form technical answers, I moved beyond keyword matching. I built a Hybrid Scoring Engine:
- Cloud Logic: Gemini acts as a "Senior Architect" to generate a gold-standard answer key.
- Vector Physics: A local PyTorch model (
all-MiniLM-L6-v2) calculates the Cosine Similarity between the candidate and the expert:
While processing the 1,140-candidate dataset, I hit the Gemini Free Tier limit hard. Instead of giving up, I engineered Exponential Backoff and a "20-Second Breather" into the code. This turned a crashing script into a resilient pipeline that "naps" when it's tired and wakes up to finish the job. It taught me that handling failure is as important as handling success.
I realized that running candidate code is a huge risk. I built an isolated sandbox.py that runs code in a temporary subprocess. It’s "dumb" on purpose—it has no access to the database or the internet, keeping the main system safe from potentially malicious or just poorly written scripts.
I learned that you don't run 1,000+ rows on the first try. I implemented a PILOT_LIMIT toggle to test the first 10 candidates. The result? A perfect CSV output that proves the math works before I spent hours running the full dataset.
The system is now fully autonomous. It:
- Filters out the noise (Pinterest/Instagram).
- Drafts personalized, technical replies.
- Vets code for resilience (429 handling).
- Ranks candidates mathematically in a final CSV.
This project taught me that "AI Engineering" is 20% prompting and 80% building the infrastructure that allows the AI to survive the real world.
Completed by Sunny.