5 LangChain Output Parsers: JSON, Pydantic, CSV and More
Learn to use all 5 essential LangChain output parsers — JsonOutputParser, PydanticOutputParser, CSV, Datetime, and StructuredOutputParser — with complete code examples.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
LLMs return text. APIs want typed data. That gap — between unstructured model output and the structured format your application actually needs — is where output parsers live. I spent too long in an early project manually parsing LLM responses with regex, then dealing with edge cases every time the model formatted things slightly differently. Output parsers solve this properly.
This guide covers the five parsers you'll actually use in production: JsonOutputParser, PydanticOutputParser, CommaSeparatedListOutputParser, DatetimeOutputParser, and StructuredOutputParser. Plus OutputFixingParser for handling the inevitable failures. Each one gets working code and a clear explanation of when to reach for it.
If you're building these parsers into LCEL chains, the LangChain Expression Language guide covers the composition patterns. For the bigger picture of how output parsers fit into agent workflows, see Build AI agent with LangChain.
Why Output Parsers Matter More Than You Think
Raw LLM output is unreliable as a data format. Ask GPT-4 to return JSON and it'll usually do it — until it doesn't. Until it wraps the JSON in markdown code fences. Until it adds an explanation before the JSON. Until it uses single quotes instead of double quotes. Any of these break a naive json.loads() call.
Output parsers do two things:
- Format instructions: Tell the model exactly how to format its response, included in the prompt automatically.
- Parsing logic: Transform the raw string into the target data type, with error handling.
The format instructions are the key insight. You're not just parsing the output — you're actively shaping it at the prompt level.
Setup
pip install langchain langchain-openai pydantic
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
os.environ["OPENAI_API_KEY"] = "your-key-here"
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
Parser 1: JsonOutputParser
JsonOutputParser is your go-to when you need JSON output and don't want to define a full schema upfront. It handles the markdown code fence stripping and JSON parsing automatically.
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
parser = JsonOutputParser()
prompt = ChatPromptTemplate.from_template("""
Extract the following information from the company description and return as JSON.
Company Description: {description}
Return JSON with these exact keys: name, industry, founded_year, headquarters, employee_count_estimate.
{format_instructions}
""")
chain = prompt | llm | parser
result = chain.invoke({
"description": "Stripe was founded in 2010 in San Francisco. The fintech company has over 8,000 employees and provides payment processing infrastructure used by millions of businesses worldwide.",
"format_instructions": parser.get_format_instructions()
})
print(type(result)) # <class 'dict'>
print(result)
# {
# "name": "Stripe",
# "industry": "Fintech / Payment Processing",
# "founded_year": 2010,
# "headquarters": "San Francisco",
# "employee_count_estimate": 8000
# }
JsonOutputParser also supports streaming — it yields partial objects as they're generated:
# Streaming JSON (useful for progressive UI updates)
for partial in chain.stream({
"description": "OpenAI is an AI research company founded in 2015...",
"format_instructions": parser.get_format_instructions()
}):
print(partial) # progressively fills in the dict
For schemas, pass a Pydantic model:
from pydantic import BaseModel
from typing import List, Optional
from langchain_core.output_parsers import JsonOutputParser
class CompanyInfo(BaseModel):
name: str
industry: str
founded_year: int
headquarters: str
notable_products: List[str]
is_public: Optional[bool] = None
parser = JsonOutputParser(pydantic_object=CompanyInfo)
prompt = ChatPromptTemplate.from_messages([
("system", "You extract structured company information from text."),
("human", "{description}\n\n{format_instructions}")
])
chain = prompt | llm | parser
result = chain.invoke({
"description": "Apple Inc., founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne in Cupertino, California...",
"format_instructions": parser.get_format_instructions()
})
print(result) # Returns a dict (not validated Pydantic object)
Parser 2: PydanticOutputParser (Best for Production)
PydanticOutputParser returns actual validated Pydantic objects, not just dicts. This gives you type checking, field validation, and defaults — all the things you want in a production API.
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class ProductReview(BaseModel):
product_name: str = Field(description="Name of the product being reviewed")
rating: float = Field(description="Rating from 1.0 to 5.0", ge=1.0, le=5.0)
sentiment: Sentiment = Field(description="Overall sentiment of the review")
pros: List[str] = Field(description="List of positive aspects mentioned")
cons: List[str] = Field(description="List of negative aspects mentioned")
summary: str = Field(description="One-sentence summary of the review")
would_recommend: bool = Field(description="Whether the reviewer would recommend the product")
@validator("rating")
def round_rating(cls, v):
return round(v, 1)
parser = PydanticOutputParser(pydantic_object=ProductReview)
prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert at extracting structured data from product reviews."),
("human", """
Analyze this review and extract structured data:
{review}
{format_instructions}
""")
])
chain = prompt | llm | parser
review_text = """
Just got the Sony WH-1000XM5 headphones and wow. The noise cancellation is the best I've ever used -
I can't hear anything on my commute anymore. Sound quality is fantastic, very clear highs and punchy bass.
Battery life is incredible, easily 30+ hours. The only downsides are the price ($350 is steep) and they're
a bit uncomfortable after 3+ hours. Would definitely buy again though. 9/10.
"""
result = chain.invoke({
"review": review_text,
"format_instructions": parser.get_format_instructions()
})
print(type(result)) # <class 'ProductReview'>
print(result.sentiment) # Sentiment.POSITIVE
print(result.rating) # 4.5
print(result.pros) # ['Excellent noise cancellation', 'Great sound quality', ...]
print(result.would_recommend) # True
# Full Pydantic model — serialize to dict/JSON easily
print(result.dict())
print(result.json(indent=2))
Multiple items with nested models:
class JobPosting(BaseModel):
title: str
company: str
location: str
salary_min: Optional[int] = None
salary_max: Optional[int] = None
required_skills: List[str]
experience_years: int
remote_allowed: bool
class JobPostingList(BaseModel):
postings: List[JobPosting]
total_count: int
parser = PydanticOutputParser(pydantic_object=JobPostingList)
prompt = ChatPromptTemplate.from_messages([
("system", "Extract all job postings from the provided text."),
("human", "{text}\n\n{format_instructions}")
])
chain = prompt | llm | parser
result = chain.invoke({
"text": "Software Engineer at Google, NYC, $150k-$200k, requires Python, 3+ years exp, remote OK...",
"format_instructions": parser.get_format_instructions()
})
print(f"Found {result.total_count} job posting(s)")
for job in result.postings:
print(f"- {job.title} at {job.company}")
Parser 3: CommaSeparatedListOutputParser
When you just need a list and don't need a full JSON structure, this parser is the simplest option.
from langchain_core.output_parsers import CommaSeparatedListOutputParser
list_parser = CommaSeparatedListOutputParser()
prompt = ChatPromptTemplate.from_messages([
("human", """List the top 10 Python libraries for data science.
Return only library names as a comma-separated list, nothing else.
{format_instructions}""")
])
chain = prompt | llm | list_parser
result = chain.invoke({
"format_instructions": list_parser.get_format_instructions()
})
print(type(result)) # <class 'list'>
print(result) # ['numpy', 'pandas', 'matplotlib', 'scikit-learn', ...]
print(len(result)) # 10
# Practical use: keyword extraction
keyword_prompt = ChatPromptTemplate.from_messages([
("human", "Extract the main keywords from this article. Return as comma-separated list.\n\nArticle: {article}\n\n{format_instructions}")
])
keyword_chain = keyword_prompt | llm | list_parser
keywords = keyword_chain.invoke({
"article": "LangChain has released LCEL as the primary way to build chains...",
"format_instructions": list_parser.get_format_instructions()
})
print("Keywords:", keywords)
Parser 4: DatetimeOutputParser
Extracting dates from unstructured text is a surprisingly common requirement. The DatetimeOutputParser handles natural language date strings and converts them to Python datetime objects.
from langchain.output_parsers import DatetimeOutputParser
datetime_parser = DatetimeOutputParser()
prompt = ChatPromptTemplate.from_messages([
("human", """When did this event occur? Extract the exact date.
Event description: {event}
{format_instructions}
Return only the datetime in the specified format.""")
])
chain = prompt | llm | datetime_parser
result = chain.invoke({
"event": "The first iPhone was unveiled by Steve Jobs on January 9, 2007 at the Macworld Conference & Expo.",
"format_instructions": datetime_parser.get_format_instructions()
})
print(type(result)) # <class 'datetime.datetime'>
print(result) # 2007-01-09 00:00:00
print(result.year) # 2007
# Practical: extract multiple dates
from typing import List
import json
def extract_dates_from_text(text: str) -> List:
"""Extract all dates from text using a list + datetime approach."""
list_prompt = ChatPromptTemplate.from_messages([
("human", "Extract all dates mentioned in this text as a JSON array of ISO format dates (YYYY-MM-DD).\n\nText: {text}\n\nReturn only the JSON array.")
])
list_chain = list_prompt | llm | JsonOutputParser()
return list_chain.invoke({"text": text})
dates = extract_dates_from_text("The project started on March 15, 2024 and ended December 1, 2024.")
print(dates) # ['2024-03-15', '2024-12-01']
Parser 5: StructuredOutputParser (Multi-Field Without Full Pydantic)
StructuredOutputParser sits between CommaSeparatedListOutputParser and PydanticOutputParser — it handles multiple fields with descriptions but without Pydantic's full validation.
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
# Define the schema
response_schemas = [
ResponseSchema(name="answer", description="The direct answer to the question"),
ResponseSchema(name="confidence", description="Confidence level: high, medium, or low"),
ResponseSchema(name="sources_needed", description="Whether external sources would be needed to verify: true or false"),
ResponseSchema(name="follow_up_questions", description="Three follow-up questions as a JSON array of strings")
]
structured_parser = StructuredOutputParser.from_response_schemas(response_schemas)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a knowledgeable assistant. Answer questions clearly."),
("human", """Question: {question}
{format_instructions}""")
])
chain = prompt | llm | structured_parser
result = chain.invoke({
"question": "What is the capital of France?",
"format_instructions": structured_parser.get_format_instructions()
})
print(type(result)) # <class 'dict'>
print(result["answer"]) # Paris
print(result["confidence"]) # high
print(result["follow_up_questions"]) # string (need to json.loads this)
OutputFixingParser: Automatic Recovery
Any parser can fail. The LLM might format things slightly wrong, use the wrong quote style, or add an extra explanation. OutputFixingParser wraps another parser and automatically retries with the error message if parsing fails.
from langchain.output_parsers import OutputFixingParser
# Wrap any parser to make it self-healing
robust_parser = OutputFixingParser.from_llm(
parser=PydanticOutputParser(pydantic_object=ProductReview),
llm=llm,
max_retries=3
)
prompt = ChatPromptTemplate.from_messages([
("human", "{review}\n\n{format_instructions}")
])
# This chain will retry up to 3 times if the LLM output is malformed
robust_chain = prompt | llm | robust_parser
# Simulate what happens with badly formatted output
from langchain_core.output_parsers import PydanticOutputParser
from langchain.output_parsers import OutputFixingParser
base_parser = PydanticOutputParser(pydantic_object=ProductReview)
fixing_parser = OutputFixingParser.from_llm(parser=base_parser, llm=llm)
# The fixing parser can handle cases like:
# - Missing required fields
# - Wrong data types
# - Extra text before/after JSON
# - Single quotes instead of double quotes
bad_output = "Here's the review data: {'product_name': 'Headphones', rating: 4.5}"
try:
result = fixing_parser.parse(bad_output)
print("Fixed successfully:", result)
except Exception as e:
print(f"Even fixing failed: {e}")
Comparison Table: When to Use Each Parser
| Parser | Output Type | Schema Required | Streaming | Best For |
|---|---|---|---|---|
| JsonOutputParser | dict | Optional | Yes | Dynamic schemas, prototyping |
| PydanticOutputParser | Pydantic model | Yes | No | Production APIs, validated data |
| CommaSeparatedListOutputParser | list[str] | No | No | Simple lists, keywords |
| DatetimeOutputParser | datetime | No | No | Date extraction |
| StructuredOutputParser | dict | Yes (loose) | No | Multi-field without full Pydantic |
| OutputFixingParser | Any | Depends | No | Wrapping any parser for resilience |
According to OpenAI's function calling documentation, using structured outputs (equivalent to PydanticOutputParser in behavior) reduces parsing failures by over 90% compared to prompt-based JSON extraction. LangChain's parsers achieve similar results by embedding format instructions directly into the prompt.
Practical Pattern: Hierarchical Data Extraction
Here's a real-world example — extracting structured data from a long text document:
from pydantic import BaseModel, Field
from typing import List, Optional
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
class Person(BaseModel):
name: str
role: Optional[str] = None
organization: Optional[str] = None
class Event(BaseModel):
title: str
date: Optional[str] = None
location: Optional[str] = None
participants: List[Person] = []
outcome: Optional[str] = None
key_decisions: List[str] = []
class DocumentAnalysis(BaseModel):
document_type: str
main_topic: str
events: List[Event]
mentioned_organizations: List[str]
action_items: List[str]
sentiment_overall: str
parser = PydanticOutputParser(pydantic_object=DocumentAnalysis)
analysis_prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert document analyst. Extract all structured information precisely."),
("human", """Analyze this document and extract all structured information.
Document:
{document}
{format_instructions}""")
])
chain = analysis_prompt | llm | parser
document = """
Meeting Notes - Q4 Planning Session
Date: November 15, 2025
Location: Conference Room B, Chicago HQ
Attendees: Sarah Chen (CEO, Acme Corp), Marcus Williams (CTO), Jennifer Park (Head of Product)
The team reviewed Q3 results. Revenue was up 23% YoY.
Key decisions:
1. Launch new mobile app by December 15
2. Hire 5 additional engineers in Q1
3. Partner with DataFlow Inc for analytics integration
Action items:
- Marcus to create technical spec by Nov 22
- Jennifer to finalize feature list by Nov 20
- Sarah to approve budget by Nov 25
"""
result = chain.invoke({
"document": document,
"format_instructions": parser.get_format_instructions()
})
print(f"Document type: {result.document_type}")
print(f"Events found: {len(result.events)}")
print(f"Action items: {result.action_items}")
print(f"Organizations: {result.mentioned_organizations}")
For a complete picture of how output parsers fit into end-to-end agent systems, see AI research agent build and RAG system tutorial. If you're building the full pipeline including vector search, Vector database guide shows how structured output can feed into vector stores.
The OpenAI Assistants API guide covers OpenAI's native structured output, which is an alternative to LangChain parsers worth knowing about.
Conclusion
Output parsers are the translation layer between the LLM's natural language and your application's data model. PydanticOutputParser is the right choice for production — it gives you validated, typed objects that fail loudly if the model outputs something unexpected. JsonOutputParser is faster to set up during development. OutputFixingParser wraps either one and makes them resilient to occasional model misbehavior.
The format instructions included in each parser's prompt do more work than most people realize. They're not just hints — they're precise formatting contracts that dramatically reduce parse errors. Always call parser.get_format_instructions() and include it in your prompt.
Check out Build AI chatbot Python for a full application that uses these parsers to extract structured conversation data, and Deploy AI model to production for production deployment patterns.
Frequently Asked Questions
Which LangChain output parser should I use for production APIs?
PydanticOutputParser is the best choice for production APIs because it gives you validated, typed Python objects with automatic error messages. If the LLM returns malformed output, Pydantic's validation catches it immediately. Pair it with OutputFixingParser for automatic retry on validation failures.
What happens when a LangChain output parser fails?
By default, parsers raise an OutputParserException. Wrap your parser with OutputFixingParser to automatically retry with the error message, or use RetryWithErrorOutputParser for more control. Always handle parser exceptions in production code.
Can I use LangChain output parsers with streaming responses?
JsonOutputParser supports streaming — it yields partial JSON objects as tokens arrive. PydanticOutputParser does not support streaming because validation only works on complete objects. For streaming with structure, use JsonOutputParser and validate the complete result at the end.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.
Build a LangChain Agent with Memory and Tools (Full Example)
Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.