Build a Resume Screener Agent with AutoGen (HR Tool 2026)
Build a complete AutoGen HR agent that parses PDF resumes, matches candidates to job criteria, ranks applicants, and generates structured screening reports automatically.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Screening resumes is one of the most time-consuming parts of recruiting. For roles that receive hundreds of applications, a human recruiter spends 6-8 seconds per resume during initial screening — enough to check for obvious disqualifiers, not enough for genuine assessment. Consistency suffers, fatigue sets in, and qualified candidates get missed.
An AutoGen-based screener handles the volume problem while applying criteria consistently across every application. This guide builds a complete HR agent: PDF parsing, multi-criteria matching, weighted scoring, candidate ranking, and a hiring manager report. It's a starting tool, not a replacement for human judgment.
System Architecture
The screener uses three specialized AutoGen agents working in sequence:
- Parser Agent — extracts structured data from raw resume text
- Evaluator Agent — scores candidates against job criteria
- Reporter Agent — generates the final ranked report
PDF Files → Parser Agent → Structured Profiles
Job Description → Evaluator Agent → Scored Candidates
Scored Candidates → Reporter Agent → Hiring Manager Report
Dependencies
pip install pyautogen pdfplumber python-docx openai pydantic rich
pdfplumber handles PDF text extraction reliably across different PDF generators. It's more accurate than PyPDF2 for structured content extraction.
Defining the Data Models
Start with Pydantic models that represent what the agent extracts and produces:
# models.py
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class ExperienceLevel(str, Enum):
ENTRY = "entry"
MID = "mid"
SENIOR = "senior"
LEAD = "lead"
EXECUTIVE = "executive"
class WorkExperience(BaseModel):
company: str
title: str
duration_months: Optional[int] = None
description: Optional[str] = None
technologies: List[str] = []
class Education(BaseModel):
degree: str
field: Optional[str] = None
institution: Optional[str] = None
year: Optional[int] = None
class ParsedResume(BaseModel):
candidate_name: str
email: Optional[str] = None
phone: Optional[str] = None
location: Optional[str] = None
total_experience_years: float = 0.0
experience_level: ExperienceLevel = ExperienceLevel.ENTRY
skills: List[str] = []
work_experiences: List[WorkExperience] = []
education: List[Education] = []
certifications: List[str] = []
summary: Optional[str] = None
raw_text: str = ""
class ScoringCriteria(BaseModel):
required_skills: List[str]
preferred_skills: List[str] = []
min_experience_years: float = 0.0
required_education_level: Optional[str] = None
location_preference: Optional[str] = None
job_title: str
job_description: str
class CandidateScore(BaseModel):
candidate_name: str
email: Optional[str]
total_score: float # 0-100
required_skills_score: float # 0-40 points
experience_score: float # 0-30 points
education_score: float # 0-15 points
preferred_skills_score: float # 0-15 points
matched_required_skills: List[str]
missing_required_skills: List[str]
matched_preferred_skills: List[str]
experience_years: float
hiring_recommendation: str # "Strong Yes", "Yes", "Maybe", "No"
notes: str
resume_file: str
PDF Parsing Module
# pdf_parser.py
import pdfplumber
import re
from pathlib import Path
from typing import Optional
def extract_text_from_pdf(pdf_path: str) -> str:
"""Extract all text from a PDF file."""
text_parts = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
text = page.extract_text()
if text:
text_parts.append(text)
full_text = "\n".join(text_parts)
# Clean up common PDF artifacts
full_text = re.sub(r'\n{3,}', '\n\n', full_text) # Collapse multiple newlines
full_text = re.sub(r' {2,}', ' ', full_text) # Collapse multiple spaces
full_text = full_text.strip()
return full_text
def extract_contact_info(text: str) -> dict:
"""Extract email and phone from text using regex."""
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
phone_pattern = r'(\+?1?\s?)?(\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4})'
emails = re.findall(email_pattern, text)
phones = re.findall(phone_pattern, text)
return {
"email": emails[0] if emails else None,
"phone": phones[0][0] + phones[0][1] if phones else None,
}
The AutoGen Agent System
Now build the three-agent pipeline:
# resume_screener.py
import autogen
import json
import os
from pathlib import Path
from typing import List
from datetime import datetime
from pdf_parser import extract_text_from_pdf, extract_contact_info
from models import ParsedResume, ScoringCriteria, CandidateScore
def build_screener_agents():
llm_config = {
"config_list": [
{"model": "gpt-4o", "api_key": os.getenv("OPENAI_API_KEY")}
],
"temperature": 0.1, # Low temperature for consistent scoring
}
# Agent 1: Resume Parser
parser_agent = autogen.AssistantAgent(
name="ResumeParser",
system_message="""You are a resume parsing specialist. Your ONLY job is to extract
structured information from resume text.
Given resume text, you must extract and return a JSON object with these fields:
- candidate_name: Full name
- email: Email address (or null)
- phone: Phone number (or null)
- location: City/State/Country (or null)
- total_experience_years: Calculated total (float)
- experience_level: "entry" (<2yr), "mid" (2-5yr), "senior" (5-10yr), "lead" (8-15yr), "executive" (10+yr)
- skills: Array of technical skills, tools, languages
- work_experiences: Array of {company, title, duration_months, technologies}
- education: Array of {degree, field, institution, year}
- certifications: Array of certification names
- summary: 2-3 sentence professional summary
Return ONLY valid JSON. No other text. No markdown code blocks.
If you cannot determine a value, use null or empty array.
When done, write PARSING_COMPLETE after the JSON.""",
llm_config=llm_config,
)
# Agent 2: Candidate Evaluator
evaluator_agent = autogen.AssistantAgent(
name="CandidateEvaluator",
system_message="""You are a technical hiring evaluator. You score candidates against
job requirements with precision and consistency.
Scoring methodology:
- Required skills match: 40 points (proportional to % matched)
- Experience match: 30 points (full points if meets/exceeds minimum)
- Education match: 15 points (degree level match)
- Preferred skills: 15 points (proportional to % matched)
Hiring recommendations based on total score:
- 85-100: "Strong Yes" — exceeds requirements
- 70-84: "Yes" — meets requirements
- 55-69: "Maybe" — partially meets, worth interview
- 0-54: "No" — does not meet minimum requirements
Return ONLY valid JSON with fields matching CandidateScore model.
When done, write EVALUATION_COMPLETE after the JSON.""",
llm_config=llm_config,
)
# Agent 3: Report Generator
reporter_agent = autogen.AssistantAgent(
name="HiringReporter",
system_message="""You are a hiring manager reporting specialist. You create
clear, structured reports from candidate evaluation data.
Your report should include:
1. Executive Summary (top 3 candidates with brief rationale)
2. Full Rankings Table (all candidates ranked by score)
3. Skill Gap Analysis (commonly missing skills across candidates)
4. Individual Profiles (one paragraph per candidate, top 5 only)
5. Hiring Recommendation (suggested interview shortlist)
Format as clean markdown. Be objective and specific.
When done, write REPORT_COMPLETE.""",
llm_config=llm_config,
)
coordinator = autogen.UserProxyAgent(
name="HRCoordinator",
human_input_mode="NEVER",
max_consecutive_auto_reply=5,
is_termination_msg=lambda msg: "REPORT_COMPLETE" in (msg.get("content") or ""),
code_execution_config=False,
)
return parser_agent, evaluator_agent, reporter_agent, coordinator
def parse_single_resume(
resume_text: str,
parser_agent: autogen.AssistantAgent,
coordinator: autogen.UserProxyAgent,
) -> dict:
"""Parse a single resume and return structured data."""
coordinator.initiate_chat(
parser_agent,
message=f"Parse this resume and return structured JSON:\n\n{resume_text[:4000]}",
clear_history=True,
)
messages = parser_agent.chat_messages.get(coordinator, [])
for msg in reversed(messages):
if msg.get("role") == "assistant" and msg.get("content"):
content = msg["content"]
# Extract JSON from response
json_start = content.find("{")
json_end = content.rfind("}") + 1
if json_start >= 0 and json_end > json_start:
try:
return json.loads(content[json_start:json_end])
except json.JSONDecodeError:
pass
return {}
def evaluate_candidate(
parsed_resume: dict,
criteria: ScoringCriteria,
evaluator_agent: autogen.AssistantAgent,
coordinator: autogen.UserProxyAgent,
resume_file: str,
) -> dict:
"""Score a parsed resume against job criteria."""
evaluation_prompt = f"""Evaluate this candidate for the following position:
JOB TITLE: {criteria.job_title}
REQUIRED SKILLS: {', '.join(criteria.required_skills)}
PREFERRED SKILLS: {', '.join(criteria.preferred_skills)}
MINIMUM EXPERIENCE: {criteria.min_experience_years} years
REQUIRED EDUCATION: {criteria.required_education_level or 'Not specified'}
CANDIDATE DATA:
{json.dumps(parsed_resume, indent=2)[:3000]}
Score this candidate and return JSON with all CandidateScore fields.
Include resume_file: "{resume_file}" in your response."""
coordinator.initiate_chat(
evaluator_agent,
message=evaluation_prompt,
clear_history=True,
)
messages = evaluator_agent.chat_messages.get(coordinator, [])
for msg in reversed(messages):
if msg.get("role") == "assistant" and msg.get("content"):
content = msg["content"]
json_start = content.find("{")
json_end = content.rfind("}") + 1
if json_start >= 0 and json_end > json_start:
try:
return json.loads(content[json_start:json_end])
except json.JSONDecodeError:
pass
return {}
def generate_report(
scored_candidates: List[dict],
job_title: str,
reporter_agent: autogen.AssistantAgent,
coordinator: autogen.UserProxyAgent,
) -> str:
"""Generate final hiring manager report."""
# Sort by total score
sorted_candidates = sorted(scored_candidates, key=lambda x: x.get("total_score", 0), reverse=True)
report_prompt = f"""Generate a hiring manager report for the {job_title} position.
Total applicants reviewed: {len(sorted_candidates)}
Candidate evaluation data:
{json.dumps(sorted_candidates, indent=2)[:6000]}
Create a comprehensive hiring report following your reporting guidelines."""
coordinator.initiate_chat(
reporter_agent,
message=report_prompt,
clear_history=True,
)
messages = reporter_agent.chat_messages.get(coordinator, [])
for msg in reversed(messages):
if msg.get("role") == "assistant" and msg.get("content"):
return msg["content"].replace("REPORT_COMPLETE", "").strip()
return "Report generation failed."
def screen_resumes(
resume_folder: str,
criteria: ScoringCriteria,
output_file: str = "screening_report.md",
) -> str:
"""Main function: screen all PDFs in a folder against job criteria."""
parser, evaluator, reporter, coordinator = build_screener_agents()
resume_files = list(Path(resume_folder).glob("*.pdf"))
print(f"Found {len(resume_files)} PDF resumes to screen")
scored_candidates = []
for i, resume_path in enumerate(resume_files):
print(f"Processing {i+1}/{len(resume_files)}: {resume_path.name}")
# Extract text
resume_text = extract_text_from_pdf(str(resume_path))
if not resume_text:
print(f" Warning: Could not extract text from {resume_path.name}")
continue
# Parse
parsed = parse_single_resume(resume_text, parser, coordinator)
if not parsed:
print(f" Warning: Failed to parse {resume_path.name}")
continue
# Evaluate
scored = evaluate_candidate(
parsed, criteria, evaluator, coordinator, resume_path.name
)
if scored:
scored_candidates.append(scored)
recommendation = scored.get("hiring_recommendation", "Unknown")
score = scored.get("total_score", 0)
print(f" Score: {score:.1f}/100 — {recommendation}")
# Generate report
print(f"\nGenerating report for {len(scored_candidates)} evaluated candidates...")
report = generate_report(scored_candidates, criteria.job_title, reporter, coordinator)
# Save report
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = f"{criteria.job_title.replace(' ', '_')}_{timestamp}_{output_file}"
with open(output_path, "w", encoding="utf-8") as f:
f.write(f"# Resume Screening Report: {criteria.job_title}\n")
f.write(f"*Generated: {datetime.now().strftime('%Y-%m-%d %H:%M UTC')}*\n")
f.write(f"*Total applications reviewed: {len(resume_files)}*\n\n")
f.write("---\n\n")
f.write(report)
print(f"\nReport saved: {output_path}")
return output_path
Running the Screener
# run_screening.py
from resume_screener import screen_resumes
from models import ScoringCriteria
criteria = ScoringCriteria(
job_title="Senior Python Developer",
required_skills=[
"Python", "FastAPI", "PostgreSQL", "Docker", "AWS"
],
preferred_skills=[
"Kubernetes", "Redis", "Celery", "GraphQL", "Terraform"
],
min_experience_years=5.0,
required_education_level="Bachelor's",
job_description="""Senior Python developer role building scalable microservices.
Must have production experience with FastAPI and cloud deployment.
Team of 8 engineers, agile environment, fully remote.""",
)
output = screen_resumes(
resume_folder="./resumes",
criteria=criteria,
output_file="screening_results.md"
)
print(f"Done! Report at: {output}")
Sample Report Output
The agent generates structured reports like:
# Resume Screening Report: Senior Python Developer
## Executive Summary
3 strong candidates identified from 47 applications.
**Top recommendation:** Sarah Chen (94/100) — Exceeds all requirements with 8 years
Python experience, production FastAPI and AWS deployment background, and AWS Solutions
Architect certification.
## Rankings
| Rank | Candidate | Score | Recommendation | Experience |
|------|-----------|-------|----------------|-----------|
| 1 | Sarah Chen | 94 | Strong Yes | 8 years |
| 2 | Marcus Obi | 81 | Yes | 6 years |
| 3 | James Park | 76 | Yes | 5 years |
Bias Mitigation Considerations
Any automated screening tool carries bias risk. Several practices reduce this:
- Remove name and location from scoring — score only on skills, experience, and education
- Audit scoring criteria regularly — ensure required skills aren't proxies for demographics
- Document scoring weights — be transparent about how candidates are ranked
- Human review mandatory — never let the agent make final decisions
The AI agents and the future of work discussion addresses this broader challenge of human-AI collaboration in sensitive domains.
Performance at Scale
For high-volume recruiting, processing 200+ resumes sequentially takes significant time. Parallelize with a thread pool:
from concurrent.futures import ThreadPoolExecutor, as_completed
def process_resume_parallel(args):
resume_path, criteria, agents = args
# ... processing logic
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(process_resume_parallel, args): args for args in job_list}
for future in as_completed(futures):
result = future.result()
scored_candidates.append(result)
This connects to deploy AI model to production patterns for running agent workloads at scale efficiently.
Frequently Asked Questions
Is automated resume screening legal and ethical? Automated screening tools are legal in most jurisdictions but are subject to employment discrimination laws. In the US, the EEOC requires that AI hiring tools not have disparate impact on protected classes. Use this as a pre-screening tool to reduce recruiter workload, not as a final decision-maker. Always have human reviewers make final calls and audit the agent's ranking criteria for potential bias.
How accurate is the AutoGen resume screener compared to human recruiters? For objective criteria matching (years of experience, required certifications, specific technologies), the agent is highly consistent — often more reliable than humans who fatigue during high-volume screening. For subjective quality assessment, human judgment is still superior. Use the agent for initial filtering and consistency, humans for nuanced evaluation.
Can this agent handle resumes in formats other than PDF? Yes, with additional parsers. Add python-docx for Word documents (.docx), and the plain text handling for .txt files is trivial. For image-based PDFs or scanned documents, add pytesseract for OCR before feeding to the agent. The most reliable format is PDF with embedded text.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
5 AutoGen Agent Roles (Assistant, UserProxy, CodeExecutor)
Understand the 5 core AutoGen agent types — AssistantAgent, UserProxyAgent, CodeExecutorAgent, and more — with code examples and a comparison table for each role.
How to Deploy AutoGen Agents as APIs with FastAPI (2026)
Learn to serve AutoGen multi-agent systems as production REST APIs using FastAPI with async endpoints and real-time streaming responses.
How to Use AutoGen with Azure OpenAI (Enterprise Security)
Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.
Build a Code Debugging Agent with AutoGen (Auto-Fix PRs)
Build an AutoGen agent that reviews code, analyzes PR diffs, suggests fixes, and automates code quality improvements with a full working implementation.