LangChain Expression Language (LCEL): The Complete Guide
Master LangChain Expression Language (LCEL) with complete examples of pipe syntax, RunnableSequence, RunnableParallel, streaming, batching, and async invocation.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
When LangChain introduced LCEL, my first reaction was "great, another abstraction layer." My second reaction, after actually using it, was that it solved three problems I'd been fighting with in every LangChain project: parallel execution, streaming, and async support. The old LLMChain approach felt like driving through traffic — you could get where you were going, but everything was sequential and slow. LCEL changed how chains feel to build and use.
This guide is comprehensive. We'll cover the | pipe syntax from first principles, work through every major Runnable type, and compare LCEL directly against the legacy chain approach you might still see in older tutorials. By the end you'll understand not just how to write LCEL chains but why they're designed the way they are.
If you're new to LangChain entirely, LangChain tutorial 2025 will get you grounded first. For the agent-specific use cases that build on LCEL, Build AI agent with LangChain is the next stop.
What LCEL Actually Is
LCEL is an interface standard. Every component in LangChain — prompts, models, output parsers, retrievers, tools — implements the Runnable interface, which means they all have the same methods: invoke, stream, batch, ainvoke, astream, abatch. Because everything speaks the same language, you can compose them with the | pipe operator just like Unix pipes.
# This is LCEL
chain = prompt | llm | output_parser
# Calling it is the same regardless of what's in the chain
result = chain.invoke({"topic": "LCEL"})
The | operator creates a RunnableSequence under the hood. The output of each step becomes the input to the next. That's really all there is to the core concept — the power comes from the Runnable types you compose.
Installation and Setup
pip install langchain langchain-openai langchain-core
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
os.environ["OPENAI_API_KEY"] = "your-key-here"
# The simplest possible LCEL chain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Explain {concept} in one sentence.")
parser = StrOutputParser()
chain = prompt | llm | parser
result = chain.invoke({"concept": "neural networks"})
print(result)
Compare that to the legacy approach:
# Legacy LLMChain — still works but deprecated
from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
llm = OpenAI(temperature=0)
prompt = PromptTemplate(input_variables=["concept"], template="Explain {concept} in one sentence.")
chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run("neural networks")
print(result)
The legacy version works. It's just that it doesn't naturally support streaming, async, or parallel execution without significant extra work.
RunnableSequence: Chaining in Order
RunnableSequence is what the | operator creates. Each component runs in sequence, with output feeding into the next input.
from langchain_core.runnables import RunnableSequence
# Explicit construction (same as using |)
chain = RunnableSequence(
first=prompt,
middle=[llm],
last=parser
)
# Equivalent pipe syntax
chain = prompt | llm | parser
# Both work identically
print(chain.invoke({"concept": "backpropagation"}))
A more realistic sequence — a RAG chain:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
# Setup retriever (abbreviated)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
["LangChain is a framework for building LLM applications.",
"LCEL stands for LangChain Expression Language.",
"Runnables are the building blocks of LCEL chains."],
embedding=embeddings
)
retriever = vectorstore.as_retriever()
# RAG prompt
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the context below.
Context: {context}
Question: {question}
""")
llm = ChatOpenAI(model="gpt-4o-mini")
parser = StrOutputParser()
# Full RAG chain as a RunnableSequence
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt
| llm
| parser
)
result = rag_chain.invoke("What is LCEL?")
print(result)
The {"context": retriever, "question": RunnablePassthrough()} part creates a RunnableParallel implicitly — the input gets routed to both the retriever and passed through unchanged.
RunnableParallel: Running Multiple Chains at Once
RunnableParallel is where LCEL starts earning its keep. Instead of running chains sequentially, you run multiple chains on the same input simultaneously and get all results back as a dictionary.
from langchain_core.runnables import RunnableParallel
# Two different analysis chains
summarize_prompt = ChatPromptTemplate.from_template("Summarize this text in 2 sentences: {text}")
sentiment_prompt = ChatPromptTemplate.from_template("What is the sentiment of this text? Text: {text}")
keywords_prompt = ChatPromptTemplate.from_template("Extract 5 keywords from: {text}")
summarize_chain = summarize_prompt | llm | StrOutputParser()
sentiment_chain = sentiment_prompt | llm | StrOutputParser()
keywords_chain = keywords_prompt | llm | StrOutputParser()
# Run all three in parallel
analysis_chain = RunnableParallel(
summary=summarize_chain,
sentiment=sentiment_chain,
keywords=keywords_chain
)
text = """
LangChain has released LCEL as the new standard for building chains.
The expression language allows developers to compose prompts, models, and parsers
using a clean pipe syntax that mirrors functional programming patterns.
"""
results = analysis_chain.invoke({"text": text})
print("Summary:", results["summary"])
print("Sentiment:", results["sentiment"])
print("Keywords:", results["keywords"])
The parallel execution means all three LLM calls happen simultaneously. On a task like this, you get 3x throughput compared to sequential chains.
Dict shorthand works too — LangChain converts it automatically:
# Shorthand — same result
analysis_chain = {
"summary": summarize_chain,
"sentiment": sentiment_chain,
"keywords": keywords_chain
}
results = analysis_chain.invoke({"text": text}) # works because dict → RunnableParallel
Wait, that last line won't work directly — you need to wrap the dict:
from langchain_core.runnables import RunnableParallel
analysis_chain = RunnableParallel({
"summary": summarize_chain,
"sentiment": sentiment_chain,
"keywords": keywords_chain
})
RunnableLambda: Wrapping Arbitrary Functions
Sometimes you need a Python function in the middle of a chain. RunnableLambda wraps any callable so it behaves like a Runnable.
from langchain_core.runnables import RunnableLambda
# Transform the output before passing to the next step
def clean_text(text: str) -> str:
"""Remove extra whitespace and normalize."""
import re
text = re.sub(r'\s+', ' ', text)
return text.strip().lower()
def count_words(text: str) -> dict:
"""Add word count metadata."""
words = text.split()
return {
"text": text,
"word_count": len(words),
"char_count": len(text)
}
# Use in a chain
processing_chain = (
RunnableLambda(clean_text)
| RunnableLambda(count_words)
)
result = processing_chain.invoke(" This is some messy text ")
print(result)
# {'text': 'this is some messy text', 'word_count': 5, 'char_count': 23}
# Shorter syntax — just use the function directly with |
clean_lambda = RunnableLambda(clean_text)
count_lambda = RunnableLambda(count_words)
full_chain = prompt | llm | StrOutputParser() | clean_lambda | count_lambda
result = full_chain.invoke({"concept": "machine learning"})
print(result)
Async lambdas work too:
import asyncio
async def async_transform(text: str) -> str:
await asyncio.sleep(0) # yield control
return text.upper()
async_chain = prompt | llm | StrOutputParser() | RunnableLambda(async_transform)
result = asyncio.run(async_chain.ainvoke({"concept": "AI"}))
print(result)
RunnablePassthrough: Forwarding Inputs Unchanged
RunnablePassthrough is a utility that passes input through without modification. It's most useful when you need to preserve part of the input while transforming another part.
from langchain_core.runnables import RunnablePassthrough
# Pattern: preserve original input while adding context
setup_chain = RunnableParallel(
original_question=RunnablePassthrough(),
context=retriever
)
# Now both are available downstream
full_rag = (
setup_chain
| RunnableLambda(lambda x: {
"question": x["original_question"],
"context": "\n".join([doc.page_content for doc in x["context"]])
})
| rag_prompt
| llm
| StrOutputParser()
)
result = full_rag.invoke("How does LCEL work?")
print(result)
# RunnablePassthrough.assign — add new keys while keeping existing ones
from langchain_core.runnables import RunnablePassthrough
chain_with_metadata = (
prompt
| llm
| StrOutputParser()
| RunnablePassthrough.assign(
word_count=RunnableLambda(lambda x: len(x.split())),
char_count=RunnableLambda(lambda x: len(x))
)
)
result = chain_with_metadata.invoke({"concept": "transformers"})
print(result)
# {'word_count': 12, 'char_count': 73, ...original text...}
Streaming with .stream() and .astream()
This is one of my favorite LCEL features. Any LCEL chain automatically supports streaming — you get tokens as they're generated rather than waiting for the full response.
# Synchronous streaming
chain = prompt | llm | StrOutputParser()
print("Streaming response: ", end="", flush=True)
for chunk in chain.stream({"concept": "quantum computing"}):
print(chunk, end="", flush=True)
print() # newline at end
# Async streaming
import asyncio
async def stream_async():
print("Async streaming: ", end="", flush=True)
async for chunk in chain.astream({"concept": "neural networks"}):
print(chunk, end="", flush=True)
print()
asyncio.run(stream_async())
# Streaming with intermediate steps visible
from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
streaming_llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()]
)
chain_with_streaming = prompt | streaming_llm | StrOutputParser()
# This prints tokens as they arrive
result = chain_with_streaming.invoke({"concept": "backpropagation"})
For web applications (FastAPI, Flask), streaming is how you get that real-time response feel:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
chain = prompt | ChatOpenAI(model="gpt-4o-mini", streaming=True) | StrOutputParser()
@app.get("/stream")
async def stream_endpoint(concept: str):
async def generate():
async for chunk in chain.astream({"concept": concept}):
yield f"data: {chunk}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
Batch Processing with .batch()
When you have multiple inputs to process, .batch() runs them in parallel (up to a configurable concurrency limit).
# Batch processing
concepts = [
{"concept": "transformers"},
{"concept": "attention mechanism"},
{"concept": "RLHF"},
{"concept": "RAG"},
{"concept": "function calling"}
]
# Process all 5 in parallel
results = chain.batch(concepts, config={"max_concurrency": 5})
for concept, result in zip(concepts, results):
print(f"{concept['concept']}: {result[:80]}...")
# Async batch
async def batch_async():
results = await chain.abatch(concepts)
return results
results = asyncio.run(batch_async())
The performance difference between sequential and batch processing is significant. Five sequential requests might take 15 seconds; batched, it's 3-4 seconds.
LCEL vs Legacy Chains Comparison
| Feature | LCEL | Legacy Chains |
|---|---|---|
| Streaming support | Built-in | Manual, limited |
| Async support | Native | Patchy |
| Parallel execution | RunnableParallel | Sequential only |
| Batch processing | .batch() | Manual loops |
| Composability | Any + Any | Chain-specific |
| Debugging | LangSmith traces | Console logs |
| Type safety | Full Pydantic | Partial |
| Learning curve | Medium | Low |
| Code verbosity | Less | More |
| Custom components | RunnableLambda | Custom Chain subclass |
According to LangChain's migration guide, all legacy chains have LCEL equivalents and the team recommends migrating existing projects.
Advanced Pattern: Conditional Branching
from langchain_core.runnables import RunnableBranch
# Route based on input classification
classify_prompt = ChatPromptTemplate.from_template(
"Classify this question as either 'technical' or 'general': {question}. Reply with one word only."
)
classifier = classify_prompt | llm | StrOutputParser()
technical_prompt = ChatPromptTemplate.from_template(
"You are a technical expert. Answer this precisely: {question}"
)
general_prompt = ChatPromptTemplate.from_template(
"You are a friendly assistant. Answer this conversationally: {question}"
)
technical_chain = technical_prompt | llm | StrOutputParser()
general_chain = general_prompt | llm | StrOutputParser()
# Branch based on classification
def route(info):
if "technical" in info["topic"].lower():
return technical_chain
return general_chain
# Full routing chain
routing_chain = (
RunnableParallel(
question=RunnablePassthrough(),
topic=RunnableLambda(lambda x: classifier.invoke({"question": x}))
)
| RunnableLambda(route)
)
result = routing_chain.invoke("How does backpropagation work in neural networks?")
print(result)
Error Handling and Fallbacks
LCEL chains support .with_fallbacks() for graceful degradation:
from langchain_openai import ChatOpenAI
primary_llm = ChatOpenAI(model="gpt-4o", temperature=0)
fallback_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain_with_fallback = (
prompt
| primary_llm.with_fallbacks([fallback_llm])
| StrOutputParser()
)
# If gpt-4o fails (rate limit, outage), falls back to gpt-4o-mini
result = chain_with_fallback.invoke({"concept": "embeddings"})
print(result)
# Retry logic
from langchain_core.runnables import RunnableRetry
reliable_chain = (
prompt
| primary_llm.with_retry(
retry_if_exception_type=(Exception,),
stop_after_attempt=3,
wait_exponential_jitter=True
)
| StrOutputParser()
)
For complete agent patterns built on LCEL, see AI agent memory and planning and AI research agent build. If you're pairing LCEL with a RAG system, RAG system tutorial covers the retrieval integration in depth.
The OpenAI API integration guide also covers OpenAI-specific patterns that work well with LCEL's ainvoke and streaming features.
Conclusion
LCEL is LangChain's most important architectural decision in recent years. The | pipe syntax isn't just syntactic sugar — it's a unification of the interface across every component, which is what enables streaming, batching, and parallel execution to just work without special code.
Start with prompt | llm | parser and add complexity only when you need it. RunnableParallel for concurrent chains, RunnableLambda for custom logic, RunnablePassthrough for preserving context, and .stream() whenever you're building a user-facing interface. These four patterns cover 95% of real-world LCEL use cases.
Check out the LangChain output parsers guide next — output parsers are where LCEL chains get their structured output capabilities, and they compose cleanly with everything covered here.
Frequently Asked Questions
Is LCEL required in modern LangChain or can I still use legacy chains?
LCEL is the recommended approach as of LangChain v0.2+. Legacy chains like LLMChain and ConversationalRetrievalChain still work but are deprecated. New projects should use LCEL. The LangChain team has stated that legacy chains will eventually be removed.
What is the difference between RunnableSequence and RunnableParallel?
RunnableSequence runs components one after another, passing output of each step as input to the next. RunnableParallel runs multiple runnables simultaneously with the same input, combining all outputs into a dictionary. Use Parallel when you want to get multiple perspectives on the same input at once.
Does LCEL support streaming?
Yes, LCEL has first-class streaming support. Any chain built with LCEL automatically supports .stream() and .astream() for token-by-token output. You don't need to do anything special — as long as the underlying model supports streaming, LCEL passes it through.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
7 AutoGen Termination Conditions (Max Rounds, Human Approval)
Master all 7 AutoGen termination conditions including is_termination_msg, max_turns, and human approval patterns to stop agent loops reliably and safely.
AutoGen Tutorial: Microsoft's Multi-Agent Framework (2026)
Learn Microsoft AutoGen from scratch in 2026 — install, first agent conversation, GroupChat, and a full comparison of AutoGen 0.2 vs 0.4 features.
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
How to Use AutoGen with Tools (Web Scraper, Calculator, File)
Learn how to equip AutoGen agents with custom tools like web scrapers, calculators, and file handlers using register_for_llm and register_for_execution.