7 LangChain Output Fixer and Retry Strategies (2026)
Stop losing data to malformed LLM outputs. Learn 7 LangChain error recovery strategies including OutputFixingParser, RetryOutputParser, fallbacks, and exponential backoff.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
LLMs generate text. Structured applications need structured data. That gap — between raw model output and the JSON, Pydantic models, or lists your code expects — is where production AI systems fail silently or crash loudly.
LangChain provides seven distinct strategies for handling malformed output, from simple auto-fix to sophisticated fallback chains. This guide covers all of them with real code and tells you which to use in each situation.
Before diving in, the LangChain tutorial 2025 covers the basic output parser types if you need a refresher on JsonOutputParser and PydanticOutputParser.
Why Output Parsing Fails
Before fixing failures, understand what causes them:
# The model was asked for JSON but returned this:
broken_output_1 = """
Sure! Here's the JSON you requested:
# [json block start]
{
"name": "Alice",
"age": 30,
"skills": ["Python", "ML",] // trailing comma
}
# [block end]
"""
broken_output_2 = '{"name": "Bob", "age": 25, "skills": ["Python"' # truncated
broken_output_3 = """
{
"name": "Charlie"
"age": 28, // missing comma after "Charlie"
"skills": ["Go"]
}
"""
The three root causes are:
- Markdown wrapping — model adds
```jsonblocks around output - Syntax errors — trailing commas, missing quotes, JavaScript-style comments
- Truncation — output cut off near the context limit
All three are fixable with the right strategy.
Strategy 1: OutputFixingParser
OutputFixingParser wraps any existing parser. When the inner parser fails, it sends the bad output to an LLM with a request to fix it:
from langchain.output_parsers import OutputFixingParser
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List
class UserProfile(BaseModel):
name: str = Field(description="User's full name")
age: int = Field(description="User's age")
skills: List[str] = Field(description="List of technical skills")
# Base parser
json_parser = JsonOutputParser(pydantic_object=UserProfile)
# Wrapping parser with auto-fix capability
fixing_parser = OutputFixingParser.from_llm(
parser=json_parser,
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0)
)
# Test with broken JSON
broken_json = """
Sure! Here's the data:
# [json block start]
{
"name": "Alice Johnson",
"age": 30,
"skills": ["Python", "Machine Learning",]
}
# [block end]
"""
try:
result = fixing_parser.parse(broken_json)
print(f"Fixed successfully: {result}")
# → UserProfile(name='Alice Johnson', age=30, skills=['Python', 'Machine Learning'])
except Exception as e:
print(f"Still failed after fixing: {e}")
How it works: The fixing parser sends a prompt like "The following output didn't parse correctly: output. Error: error. Please return only the valid JSON." The repair LLM typically fixes the issue in one call.
Cost consideration: Each fix attempt makes one extra LLM call. With gpt-4o-mini, this costs roughly $0.0001 per fix — negligible for most applications.
Strategy 2: RetryOutputParser
RetryOutputParser is different — it includes the original prompt in the retry, not just the failed output. This is more powerful for cases where the model needs context to generate the right structure:
from langchain.output_parsers import RetryOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
# Define expected output
class ProductReview(BaseModel):
product_name: str
rating: int = Field(ge=1, le=5, description="Rating from 1 to 5")
pros: List[str] = Field(min_items=1, description="List of pros")
cons: List[str] = Field(description="List of cons")
summary: str
parser = JsonOutputParser(pydantic_object=ProductReview)
# Build the chain
prompt = ChatPromptTemplate.from_messages([
("system", "You are a product reviewer. Extract structured review data."),
("human", "{review_text}\n\n{format_instructions}")
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Retry parser wraps the original chain
retry_parser = RetryOutputParser.from_llm(
parser=parser,
llm=llm,
max_retries=3
)
# Manual retry usage
chain = prompt | llm
def parse_with_retry(review_text: str) -> ProductReview:
prompt_value = prompt.invoke({
"review_text": review_text,
"format_instructions": parser.get_format_instructions()
})
output = llm.invoke(prompt_value)
try:
return parser.parse(output.content)
except Exception:
# Retry with full context
return retry_parser.parse_with_prompt(
completion=output.content,
prompt_value=prompt_value
)
result = parse_with_retry("This laptop is amazing! Great battery life and speed, but a bit heavy.")
print(result)
Strategy 3: with_retry in LCEL
For LCEL chains, use the built-in .with_retry() method on any runnable:
from langchain_core.runnables import RunnableRetry
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
parser = JsonOutputParser()
# Simple retry on the chain
chain_with_retry = (
ChatPromptTemplate.from_template(
"Extract a JSON object with 'title' and 'year' from: {text}\n\nReturn only JSON."
)
| llm
| parser
).with_retry(
retry_if_exception_type=(ValueError, Exception),
stop_after_attempt=3,
wait_exponential_jitter=True # adds random jitter to exponential backoff
)
# Or configure more precisely
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
chain_with_advanced_retry = (
ChatPromptTemplate.from_template("Extract JSON: {text}")
| llm
| parser
).with_retry(
retry_if_exception_type=(ValueError,),
stop_after_attempt=3,
)
try:
result = chain_with_retry.invoke({"text": "The Matrix was released in 1999"})
print(result)
except Exception as e:
print(f"Failed after 3 attempts: {e}")
.with_retry() is the simplest way to add retry logic to any LCEL chain. It wraps the entire chain, not just the parser, so transient API errors and rate limits are also handled.
Strategy 4: Fallback Chains
Use .with_fallbacks() to define backup chains when the primary chain fails:
from langchain_core.runnables import RunnableWithFallbacks
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
# Primary chain: structured JSON output
primary_llm = ChatOpenAI(model="gpt-4o", temperature=0)
primary_parser = JsonOutputParser(pydantic_object=UserProfile)
primary_chain = (
ChatPromptTemplate.from_template(
"Extract user profile JSON from: {text}\n\n{format_instructions}"
).partial(format_instructions=primary_parser.get_format_instructions())
| primary_llm
| primary_parser
)
# Fallback 1: cheaper model with the same parser
fallback_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
fallback_chain_1 = (
ChatPromptTemplate.from_template(
"Extract user profile JSON from: {text}\n\n{format_instructions}"
).partial(format_instructions=primary_parser.get_format_instructions())
| fallback_llm
| OutputFixingParser.from_llm(parser=primary_parser, llm=fallback_llm)
)
# Fallback 2: unstructured text extraction (always succeeds)
fallback_chain_2 = (
ChatPromptTemplate.from_template(
"Extract the person's name, age, and skills from this text as plain text: {text}"
)
| fallback_llm
| StrOutputParser()
)
# Chain them together
resilient_chain = primary_chain.with_fallbacks(
fallbacks=[fallback_chain_1, fallback_chain_2],
exception_to_ignore_types=[ValueError] # only fall back on parsing errors
)
# The chain tries primary, then fallback_1, then fallback_2
result = resilient_chain.invoke({
"text": "Sarah, 28, works as a backend engineer specializing in Rust and distributed systems"
})
print(result)
This pattern is valuable when you need graceful degradation. A document processing pipeline might fall back from structured extraction to unstructured text rather than failing the entire job.
Strategy 5: Custom Error Handler with LCEL
Build a custom error-handling wrapper that logs failures and attempts recovery:
from langchain_core.runnables import RunnableLambda
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai import ChatOpenAI
import logging
import json
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def create_resilient_parser(schema_class, llm=None, max_attempts=3):
"""Create a parser with logging, fixing, and fallback logic."""
base_parser = JsonOutputParser(pydantic_object=schema_class)
fix_llm = llm or ChatOpenAI(model="gpt-4o-mini", temperature=0)
fixing_parser = OutputFixingParser.from_llm(parser=base_parser, llm=fix_llm)
def parse_with_recovery(text: str):
attempt = 0
last_error = None
while attempt < max_attempts:
try:
if attempt == 0:
# First attempt: direct parse
result = base_parser.parse(text)
logger.info(f"Parsed successfully on attempt {attempt + 1}")
return result
elif attempt == 1:
# Second attempt: auto-fix
result = fixing_parser.parse(text)
logger.info(f"Fixed and parsed on attempt {attempt + 1}")
return result
else:
# Third attempt: aggressive JSON extraction
extracted = extract_json_aggressive(text)
result = base_parser.parse(extracted)
logger.info(f"Aggressively extracted on attempt {attempt + 1}")
return result
except Exception as e:
last_error = e
logger.warning(f"Attempt {attempt + 1} failed: {e}")
attempt += 1
logger.error(f"All {max_attempts} attempts failed. Last error: {last_error}")
raise last_error
return RunnableLambda(parse_with_recovery)
def extract_json_aggressive(text: str) -> str:
"""Try multiple strategies to extract valid JSON from messy text."""
import re
# Strategy 1: Find JSON between code blocks
code_block_match = re.search(r'```(?:json)?\s*([\s\S]*?)```', text)
if code_block_match:
return code_block_match.group(1).strip()
# Strategy 2: Find first complete JSON object
brace_match = re.search(r'\{[\s\S]*\}', text)
if brace_match:
candidate = brace_match.group(0)
try:
json.loads(candidate)
return candidate
except json.JSONDecodeError:
pass
# Strategy 3: Fix common issues
cleaned = text.strip()
cleaned = re.sub(r',\s*([}\]])', r'\1', cleaned) # Remove trailing commas
cleaned = re.sub(r'//.*$', '', cleaned, flags=re.MULTILINE) # Remove JS comments
cleaned = re.sub(r'/\*.*?\*/', '', cleaned, flags=re.DOTALL) # Remove block comments
return cleaned
# Usage
resilient = create_resilient_parser(UserProfile, max_attempts=3)
result = resilient.invoke('{"name": "Dave", "age": 35, "skills": ["Python", "Go",]}')
Strategy 6: Pydantic Validators for Pre-Parse Correction
Add Pydantic validators that clean data before validation, reducing parse failures:
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import List, Optional, Union
import re
class RobustUserProfile(BaseModel):
name: str
age: Union[int, str] # accept string age and coerce
skills: Union[List[str], str] # accept comma-separated string
email: Optional[str] = None
@field_validator("age", mode="before")
@classmethod
def coerce_age(cls, v):
"""Accept '30 years old', '30', or 30."""
if isinstance(v, int):
return v
# Extract digits
digits = re.findall(r'\d+', str(v))
if digits:
return int(digits[0])
raise ValueError(f"Cannot parse age from: {v}")
@field_validator("skills", mode="before")
@classmethod
def coerce_skills(cls, v):
"""Accept list, comma-separated string, or JSON string."""
if isinstance(v, list):
return v
if isinstance(v, str):
# Try JSON parsing first
try:
parsed = json.loads(v)
if isinstance(parsed, list):
return parsed
except json.JSONDecodeError:
pass
# Fall back to comma splitting
return [s.strip() for s in v.split(",") if s.strip()]
return v
@field_validator("email", mode="before")
@classmethod
def validate_email(cls, v):
"""Accept None, empty string, or valid email."""
if not v:
return None
# Basic email validation — don't reject, just normalize
v = str(v).strip().lower()
return v if "@" in v else None
@model_validator(mode="after")
def check_age_range(self):
"""Post-parse validation with auto-correction."""
if self.age < 0:
self.age = 0
if self.age > 150:
self.age = 150
return self
# Test with messy inputs
parser = JsonOutputParser(pydantic_object=RobustUserProfile)
messy_json = '{"name": "Eve", "age": "28 years old", "skills": "Python, ML, LangChain", "email": ""}'
result = RobustUserProfile.model_validate_json(messy_json)
print(result)
# → RobustUserProfile(name='Eve', age=28, skills=['Python', 'ML', 'LangChain'], email=None)
Pydantic validators that coerce types reduce the surface area for parsing failures significantly. The model can output "age": "28 years old" and your application still gets age=28.
Strategy 7: Exponential Backoff for Rate Limits
API rate limits cause a different category of failures. Handle them with proper exponential backoff:
import time
import random
from functools import wraps
from typing import Callable, TypeVar, Any
import openai
T = TypeVar("T")
def exponential_backoff(
max_retries: int = 5,
base_delay: float = 1.0,
max_delay: float = 60.0,
jitter: bool = True,
retryable_exceptions=(openai.RateLimitError, openai.APIConnectionError, openai.APITimeoutError)
):
"""Decorator for exponential backoff with jitter."""
def decorator(func: Callable[..., T]) -> Callable[..., T]:
@wraps(func)
def wrapper(*args, **kwargs) -> T:
delay = base_delay
last_exception = None
for attempt in range(max_retries + 1):
try:
return func(*args, **kwargs)
except retryable_exceptions as e:
last_exception = e
if attempt == max_retries:
break
actual_delay = min(delay, max_delay)
if jitter:
actual_delay *= (0.5 + random.random())
logger.warning(
f"Attempt {attempt + 1}/{max_retries} failed: {e}. "
f"Retrying in {actual_delay:.1f}s..."
)
time.sleep(actual_delay)
delay *= 2
except Exception as e:
# Non-retryable exceptions fail immediately
raise
raise last_exception
return wrapper
return decorator
# Apply to LangChain chain invocation
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
@exponential_backoff(max_retries=5, base_delay=1.0)
def invoke_with_backoff(chain, input_data):
return chain.invoke(input_data)
# Usage
llm = ChatOpenAI(model="gpt-4o")
parser = JsonOutputParser(pydantic_object=UserProfile)
chain = ChatPromptTemplate.from_template("Extract JSON: {text}") | llm | parser
result = invoke_with_backoff(chain, {"text": "Alice is 30 and knows Python"})
LCEL Async Retry with Rate Limit Handling
For async production code:
import asyncio
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI
async def robust_chain_invoke(chain, inputs: list[dict], max_concurrent: int = 5) -> list:
"""Run multiple chain invocations with concurrency limiting and retry."""
semaphore = asyncio.Semaphore(max_concurrent)
async def invoke_one(input_data: dict):
async with semaphore:
for attempt in range(3):
try:
return await chain.ainvoke(input_data)
except Exception as e:
if attempt == 2:
logger.error(f"Failed after 3 attempts: {e}")
return {"error": str(e), "input": input_data}
wait = 2 ** attempt + random.uniform(0, 1)
await asyncio.sleep(wait)
tasks = [invoke_one(inp) for inp in inputs]
return await asyncio.gather(*tasks)
# Process 50 documents concurrently (max 5 at a time)
chain_with_retry = (
ChatPromptTemplate.from_template("Extract JSON from: {text}")
| ChatOpenAI(model="gpt-4o-mini")
| OutputFixingParser.from_llm(
parser=JsonOutputParser(pydantic_object=UserProfile),
llm=ChatOpenAI(model="gpt-4o-mini")
)
)
documents = [{"text": f"User {i}: age {20+i}, skills Python"} for i in range(50)]
results = asyncio.run(robust_chain_invoke(chain_with_retry, documents, max_concurrent=5))
Comparison of All 7 Strategies
| Strategy | Best For | Extra Cost | Complexity |
|---|---|---|---|
| OutputFixingParser | JSON syntax errors, markdown wrapping | 1 LLM call per failure | Low |
| RetryOutputParser | Schema mismatch, wrong field names | 1 LLM call per failure | Low |
| with_retry LCEL | Transient errors, rate limits | Minimal (same call) | Low |
| Fallback chains | Graceful degradation | Varies by fallback | Medium |
| Custom error handler | Logging + staged recovery | Minimal | Medium |
| Pydantic validators | Type coercion, data normalization | Zero | Low |
| Exponential backoff | Rate limits, API errors | Zero (timing only) | Low |
Recommended production stack:
- Pydantic validators — zero cost, handles type coercion automatically
OutputFixingParser— catches the majority of LLM formatting mistakeswith_retry— handles transient API failures- Exponential backoff — handles rate limits gracefully
Layer these from cheapest to most expensive. Most failures resolve at the Pydantic validator level. API errors resolve at the with_retry level. Only complex schema failures need OutputFixingParser.
Monitoring Parse Failures in Production
Track your failure rate to know when prompts need improvement:
from dataclasses import dataclass, field
from datetime import datetime
from collections import defaultdict
@dataclass
class ParseMetrics:
total: int = 0
successes: int = 0
fixed: int = 0
failed: int = 0
errors: list = field(default_factory=list)
@property
def success_rate(self):
return self.successes / self.total if self.total > 0 else 0
@property
def fix_rate(self):
return self.fixed / self.total if self.total > 0 else 0
metrics = ParseMetrics()
def tracked_parse(text: str, parser, fixing_parser):
metrics.total += 1
try:
result = parser.parse(text)
metrics.successes += 1
return result
except Exception as e1:
try:
result = fixing_parser.parse(text)
metrics.fixed += 1
return result
except Exception as e2:
metrics.failed += 1
metrics.errors.append({"text": text[:100], "error": str(e2), "time": datetime.now().isoformat()})
raise e2
# After processing 1000 documents:
print(f"Success rate: {metrics.success_rate:.1%}")
print(f"Fix rate: {metrics.fix_rate:.1%}")
print(f"Failure rate: {metrics.failed/metrics.total:.1%}")
If your fix rate exceeds 20%, your prompts need work. The most common fix is adding explicit instructions like "Return ONLY the JSON object, no explanation, no markdown" to the prompt.
For a complete view of output handling in LangChain, see the LangChain tutorial 2025 and Build AI agent with LangChain. For production deployment patterns, the Deploy AI model to production guide covers monitoring and observability.
Frequently Asked Questions
What causes LLM output parsing failures? The most common causes are: the model wrapping JSON in markdown code blocks, missing or extra commas in JSON, truncated responses when output is near the context limit, and the model generating explanatory text before or after the structured output. OutputFixingParser handles most of these cases.
How many retry attempts should I configure? For most production applications, 3 attempts with exponential backoff (1s, 2s, 4s) is the right balance. More than 5 attempts adds significant latency. If failures persist beyond 3 attempts, the issue is usually a systematic prompt problem rather than a transient parsing error.
Does RetryWithErrorOutputParser send my original prompt again? Yes. RetryWithErrorOutputParser includes the original prompt, the failed output, and the parse error in the retry request. This gives the model full context to fix its mistake. This is more expensive than OutputFixingParser but more effective for complex schema failures.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.
Build a LangChain Agent with Memory and Tools (Full Example)
Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.