-
Notifications
You must be signed in to change notification settings - Fork 0
Add LangChain resume screening cookbook #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,245 @@ | ||||||
| --- | ||||||
| title: "🦜 Screen Resumes with LangChain and ZeroGPU" | ||||||
| --- | ||||||
|
|
||||||
| {/* Colab link: add after uploading the notebook to Colab under hello@zerogpu.ai, e.g. "You can also check this cookbook in colab [here](<colab-url>)." */} | ||||||
|
|
||||||
| Recruiting teams drown in resumes. Every one is a blob of unstructured text wrapped around a pile of personal data: names, emails, phone numbers, home addresses. You want the signal (skills, titles, companies) without holding the PII any longer than you must, and GDPR and CCPA make "any longer than you must" a legal line, not a preference. This notebook builds a small resume-screening pipeline that pulls the structured signal out of a PDF resume, strips the PII before anything is stored or shared, and routes the candidate to the right team, using three ZeroGPU tools through LangChain. | ||||||
|
amaan-ai20 marked this conversation as resolved.
|
||||||
|
|
||||||
| For the full reference, see the [`langchain-zerogpu` repository on GitHub](https://github.com/zerogpu/langchain-zerogpu) and the [package on PyPI](https://pypi.org/project/langchain-zerogpu/). | ||||||
|
|
||||||
| In this notebook, you'll explore: | ||||||
|
|
||||||
| - **LangChain**: The framework for building applications with LLMs and agents. Its `langchain-zerogpu` package exposes ZeroGPU's small and nano models as first-class tools, so you can drop entity extraction, PII redaction, and zero-shot classification straight into a pipeline without writing any HTTP code. | ||||||
| - **ZeroGPU**: An ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and you're live - zero GPU infrastructure, serverless, auto-scaling by default. | ||||||
|
|
||||||
| This setup not only demonstrates a practical application of privacy-aware resume screening, but also provides a flexible framework that can be adapted to other real-world scenarios requiring structured extraction and compliant handling of sensitive documents. | ||||||
|
|
||||||
| ## 🎥 Watch the Video Guide | ||||||
|
|
||||||
| Video walkthrough coming soon. | ||||||
|
|
||||||
| ## 📦 Installation | ||||||
|
|
||||||
| First, install the `langchain-zerogpu` package, which ships the ZeroGPU tools. You'll also install `pypdf` to pull the text out of an uploaded PDF resume (this pipeline is text-only, no OCR): | ||||||
|
|
||||||
| ```bash | ||||||
| !pip install -U langchain-zerogpu pypdf | ||||||
| ``` | ||||||
|
|
||||||
| ## 🔑 Setting Up API Keys | ||||||
|
|
||||||
| You'll need to set up your ZeroGPU credentials so the tools can reach the inference API securely. This ensures every tool call is authenticated without re-prompting. | ||||||
|
|
||||||
| You can go to [here](https://platform.zerogpu.ai/dashboard) to get an API key and Project ID from ZeroGPU. The key starts with `zgpu-api-` and the Project ID (UUID) is on the project settings page. | ||||||
|
|
||||||
| ```python Python | ||||||
| import os | ||||||
| from getpass import getpass | ||||||
|
|
||||||
| # Prompt for the ZeroGPU API key and Project ID securely | ||||||
| zerogpu_api_key = getpass('Enter your ZeroGPU API key: ') | ||||||
| os.environ["ZEROGPU_API_KEY"] = zerogpu_api_key | ||||||
|
|
||||||
| zerogpu_project_id = getpass('Enter your ZeroGPU Project ID: ') | ||||||
| os.environ["ZEROGPU_PROJECT_ID"] = zerogpu_project_id | ||||||
| ``` | ||||||
|
|
||||||
| Each tool reads `ZEROGPU_API_KEY` and `ZEROGPU_PROJECT_ID` from the environment when you construct it with no arguments. The key is held as a `SecretStr` and is never logged. | ||||||
|
|
||||||
| <Note> | ||||||
| The example outputs throughout this notebook are illustrative placeholders that show the shape of each result. Rerun the cells with your own credentials and resume to capture live values. | ||||||
| </Note> | ||||||
|
|
||||||
| ## 📄 Upload a Resume | ||||||
|
|
||||||
| Upload a single PDF resume straight from your machine, then read its text with `pypdf`. Everything downstream runs on `resume_text`. | ||||||
|
|
||||||
| ```python Python | ||||||
| from google.colab import files | ||||||
| from pypdf import PdfReader | ||||||
|
|
||||||
| # Pick one PDF resume from the file dialog | ||||||
| uploaded = files.upload() | ||||||
| pdf_path = next(iter(uploaded)) | ||||||
|
|
||||||
| reader = PdfReader(pdf_path) | ||||||
| resume_text = "\n".join(page.extract_text() or "" for page in reader.pages) | ||||||
|
|
||||||
| print(f"Loaded {pdf_path} ({len(resume_text)} characters)") | ||||||
| print(resume_text[:500]) | ||||||
| ``` | ||||||
|
|
||||||
| ``` | ||||||
| Loaded jordan-lee-resume.pdf (2184 characters) | ||||||
| JORDAN LEE | ||||||
| San Francisco, CA | jordan.lee@example-mail.io | (415) 555-0143 | ||||||
| ... | ||||||
| ``` | ||||||
|
|
||||||
| ## 🏷️ Extract Candidate Entities with ZeroGPU | ||||||
|
|
||||||
| ZeroGPU is an ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and you're live - zero GPU infrastructure, serverless, auto-scaling by default. In this section, we will extract skills, job titles, companies, and locations from the resume as a structured candidate card. | ||||||
|
|
||||||
| `ZeroGPUExtractEntitiesTool` runs custom-label named-entity recognition on `gliner2-base-v1`. You pass the entity types you care about as `labels`, and it returns the matched spans grouped by label. | ||||||
|
|
||||||
| ```python Python | ||||||
| from langchain_zerogpu import ZeroGPUExtractEntitiesTool | ||||||
|
|
||||||
| extract_entities = ZeroGPUExtractEntitiesTool() | ||||||
| entities = extract_entities.invoke({ | ||||||
| "text": resume_text, | ||||||
| "labels": ["skill", "job title", "company", "location"], | ||||||
| }) | ||||||
| print(entities) | ||||||
| ``` | ||||||
|
|
||||||
| ``` | ||||||
| { | ||||||
| "entities": { | ||||||
| "skill": ["...", "..."], | ||||||
| "job title": ["..."], | ||||||
| "company": ["...", "..."], | ||||||
| "location": ["..."] | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| The result nests every match under its label, so you can assemble a clean candidate card in a couple of lines: | ||||||
|
|
||||||
| ```python Python | ||||||
| card = entities["entities"] | ||||||
| print("Candidate card") | ||||||
| print("- Skills: ", ", ".join(card.get("skill", []))) | ||||||
| print("- Titles: ", ", ".join(card.get("job title", []))) | ||||||
| print("- Companies: ", ", ".join(card.get("company", []))) | ||||||
| print("- Locations: ", ", ".join(card.get("location", []))) | ||||||
| ``` | ||||||
|
|
||||||
| ``` | ||||||
| Candidate card | ||||||
| - Skills: ..., ... | ||||||
| - Titles: ... | ||||||
| - Companies: ..., ... | ||||||
| - Locations: ... | ||||||
| ``` | ||||||
|
|
||||||
| 🎉 **ZeroGPU effortlessly turns a wall of resume text into a structured candidate card in one call, providing the cheap, repeatable extraction layer your screening pipeline runs on!** | ||||||
|
|
||||||
| ## 🔐 Redact PII Before Sharing | ||||||
|
|
||||||
| Before the resume goes into a dashboard, an applicant-tracking system, or a shared review channel, the personal data has to come out. `ZeroGPURedactPIITool` masks PII inline on `gliner-multi-pii-v1`, replacing each match with an uppercase `[LABEL]` placeholder and returning the redacted text. | ||||||
|
|
||||||
| ```python Python | ||||||
| from langchain_zerogpu import ZeroGPURedactPIITool | ||||||
|
|
||||||
| redact_pii = ZeroGPURedactPIITool() | ||||||
| anonymized_resume = redact_pii.invoke({"text": resume_text}) | ||||||
| print(anonymized_resume[:500]) | ||||||
| ``` | ||||||
|
|
||||||
| ``` | ||||||
| [PERSON] | ||||||
| [ADDRESS] | [EMAIL] | [PHONE_NUMBER] | ||||||
| ... | ||||||
| ``` | ||||||
|
|
||||||
| Names, emails, phone numbers, and addresses come back masked; skills, titles, and companies pass through untouched, so the redacted copy is still useful for review while the identifying details are gone. This is the version that is safe to store or share. | ||||||
|
|
||||||
| 🎉 **ZeroGPU strips the personal data out of the resume in a single call, giving you a compliant, shareable copy without touching the rest of the content!** | ||||||
|
|
||||||
| ## 🧭 Route the Candidate with Zero-Shot Classification | ||||||
|
|
||||||
| With the signal extracted and the PII gone, the last step is routing: which team should review this candidate? `ZeroGPUClassifyZeroShotTool` scores the resume against a flat list of labels on `deberta-v3-small` and returns a score per label, so you can pick the best match. | ||||||
|
|
||||||
|
amaan-ai20 marked this conversation as resolved.
|
||||||
| ```python Python | ||||||
| from langchain_zerogpu import ZeroGPUClassifyZeroShotTool | ||||||
|
|
||||||
| classify = ZeroGPUClassifyZeroShotTool() | ||||||
| routing = classify.invoke({ | ||||||
| "text": resume_text, | ||||||
| "labels": ["engineering", "design", "sales", "operations"], | ||||||
| }) | ||||||
| print(routing) | ||||||
|
|
||||||
| team = max(routing, key=routing.get) | ||||||
| print(f"Route to: {team}") | ||||||
| ``` | ||||||
|
|
||||||
| ``` | ||||||
| {"engineering": 0.93, "design": 0.04, "sales": 0.02, "operations": 0.01} | ||||||
| Route to: engineering | ||||||
| ``` | ||||||
|
|
||||||
| The tool returns every label with a score, so `max(routing, key=routing.get)` gives you the top team. Swap the `labels` list for your own org chart to route against any set of teams. | ||||||
|
|
||||||
| 🎉 **ZeroGPU classifies the resume against your own team labels in one call, turning unstructured text into a routing decision!** | ||||||
|
|
||||||
| ## 🤖 Run the Full Screening Pipeline | ||||||
|
|
||||||
| Now chain the three tools into one function that takes a resume and returns a structured candidate card, an anonymized copy, and a routing label. This is the end-to-end pipeline you'd wire into a real screening flow. | ||||||
|
|
||||||
| ```python Python | ||||||
| import json | ||||||
|
|
||||||
| from langchain_zerogpu import ( | ||||||
| ZeroGPUClassifyZeroShotTool, | ||||||
| ZeroGPUExtractEntitiesTool, | ||||||
| ZeroGPURedactPIITool, | ||||||
| ) | ||||||
|
|
||||||
| extract_entities = ZeroGPUExtractEntitiesTool() | ||||||
| redact_pii = ZeroGPURedactPIITool() | ||||||
| classify = ZeroGPUClassifyZeroShotTool() | ||||||
|
|
||||||
|
|
||||||
| def screen_resume(text: str) -> dict: | ||||||
| entities = extract_entities.invoke({ | ||||||
| "text": text, | ||||||
| "labels": ["skill", "job title", "company", "location"], | ||||||
| }) | ||||||
| anonymized = redact_pii.invoke({"text": text}) | ||||||
| routing = classify.invoke({ | ||||||
| "text": text, | ||||||
| "labels": ["engineering", "design", "sales", "operations"], | ||||||
| }) | ||||||
| return { | ||||||
| "candidate_card": entities["entities"], | ||||||
| "anonymized_resume": anonymized, | ||||||
| "routing_label": max(routing, key=routing.get), | ||||||
| "routing_scores": routing, | ||||||
| } | ||||||
|
|
||||||
|
|
||||||
| result = screen_resume(resume_text) | ||||||
| print(json.dumps(result, indent=2)) | ||||||
| ``` | ||||||
|
|
||||||
| ``` | ||||||
| { | ||||||
| "candidate_card": { | ||||||
| "skill": ["...", "..."], | ||||||
| "job title": ["..."], | ||||||
| "company": ["...", "..."], | ||||||
| "location": ["..."] | ||||||
| }, | ||||||
| "anonymized_resume": "[PERSON]\n[ADDRESS] | [EMAIL] | [PHONE_NUMBER]\n...", | ||||||
| "routing_label": "engineering", | ||||||
| "routing_scores": {"engineering": 0.93, "design": 0.04, "sales": 0.02, "operations": 0.01} | ||||||
| } | ||||||
| ``` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we please replace this with actual output values. |
||||||
|
|
||||||
| 🎉 From a single PDF, the pipeline produced a structured candidate card, a PII-free copy safe to store or share, and a routing label, with each task running on a purpose-built ZeroGPU model instead of a frontier LLM. | ||||||
|
|
||||||
| ## 🌟 Highlights | ||||||
|
|
||||||
| This notebook has guided you through setting up and running a LangChain workflow with ZeroGPU for privacy-aware resume screening. You can adapt and expand this example for various other scenarios requiring structured extraction and compliant handling of sensitive documents. | ||||||
|
|
||||||
| Key tools utilized in this notebook include: | ||||||
|
|
||||||
| - **LangChain**: The framework for building applications with LLMs and agents. Its `langchain-zerogpu` package exposes ZeroGPU's small and nano models as first-class tools, so you can drop entity extraction, PII redaction, and zero-shot classification straight into a pipeline without writing any HTTP code. | ||||||
| - **ZeroGPU**: An ultra-fast, compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge-powered network for the high-volume, purpose-specific tasks your app or agent runs constantly. Plug in our OpenAI-compatible API and you're live - zero GPU infrastructure, serverless, auto-scaling by default. | ||||||
|
|
||||||
| This comprehensive setup allows you to adapt and expand the example for various scenarios requiring structured extraction and compliant handling of sensitive documents. For more, see the [`langchain-zerogpu` repository](https://github.com/zerogpu/langchain-zerogpu), the [PyPI page](https://pypi.org/project/langchain-zerogpu/), and the [ZeroGPU documentation](https://docs.zerogpu.ai). | ||||||
| </content> | ||||||
| </invoke> | ||||||
|
Comment on lines
+244
to
+245
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unmatched closing tags at the end of the MDX file.
Suggested fix-</content>
-</invoke>📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nishithatanukunuri please remove this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this as well once the colab is live.