Build a LangChain Agent for Financial Data Analysis (2026)
Build a LangChain stock analysis agent using Yahoo Finance, SEC EDGAR, and custom financial ratio tools — with a full comparison of financial data sources.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Financial data analysis is one of the most compelling applications of LangChain agents. The combination of structured financial data, natural language questions, and tool-calling LLMs creates an analyst assistant that can answer questions like "Is Apple overvalued compared to its sector?" or "What was Tesla's free cash flow trend over the last four quarters?" — questions that previously required an Excel model and thirty minutes of work.
This guide builds a complete stock analysis agent from the ground up. You will get a Yahoo Finance tool, an SEC EDGAR filing tool, a custom financial ratio calculator, and a comparison table of financial data sources. The final agent handles multi-step research questions with citations.
If you are new to LangChain agents, start with Build AI agent with LangChain. For a research-focused agent, see AI research agent build.
Setup
pip install langchain langchain-openai yfinance pandas requests beautifulsoup4
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
Building the Yahoo Finance Tool
The yfinance library provides comprehensive data: price history, financials, balance sheets, earnings, and more.
import yfinance as yf
import pandas as pd
from langchain_core.tools import tool
from typing import Optional
import json
@tool
def get_stock_price(ticker: str, period: str = "1mo") -> str:
"""
Get recent stock price data for a ticker symbol.
Args:
ticker: Stock ticker symbol (e.g., 'AAPL', 'MSFT', 'TSLA')
period: Time period - '1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y'
Returns:
JSON string with price data including open, high, low, close, volume
"""
try:
stock = yf.Ticker(ticker)
hist = stock.history(period=period)
if hist.empty:
return f"No price data found for {ticker}"
latest = hist.iloc[-1]
first = hist.iloc[0]
price_change = ((latest["Close"] - first["Close"]) / first["Close"]) * 100
result = {
"ticker": ticker.upper(),
"current_price": round(latest["Close"], 2),
"period_start_price": round(first["Close"], 2),
"price_change_pct": round(price_change, 2),
"period_high": round(hist["High"].max(), 2),
"period_low": round(hist["Low"].min(), 2),
"avg_volume": int(hist["Volume"].mean()),
"period": period,
"data_points": len(hist)
}
return json.dumps(result, indent=2)
except Exception as e:
return f"Error fetching price data for {ticker}: {str(e)}"
@tool
def get_company_financials(ticker: str) -> str:
"""
Get key financial metrics for a company: revenue, earnings, margins, and growth rates.
Args:
ticker: Stock ticker symbol
Returns:
JSON string with income statement highlights
"""
try:
stock = yf.Ticker(ticker)
info = stock.info
financials = stock.financials
# Extract key metrics from info
result = {
"ticker": ticker.upper(),
"company_name": info.get("longName", "Unknown"),
"sector": info.get("sector", "Unknown"),
"industry": info.get("industry", "Unknown"),
"market_cap": info.get("marketCap"),
"enterprise_value": info.get("enterpriseValue"),
"trailing_pe": info.get("trailingPE"),
"forward_pe": info.get("forwardPE"),
"price_to_book": info.get("priceToBook"),
"price_to_sales": info.get("priceToSalesTrailing12Months"),
"ev_to_ebitda": info.get("enterpriseToEbitda"),
"revenue_ttm": info.get("totalRevenue"),
"gross_margins": info.get("grossMargins"),
"operating_margins": info.get("operatingMargins"),
"profit_margins": info.get("profitMargins"),
"revenue_growth": info.get("revenueGrowth"),
"earnings_growth": info.get("earningsGrowth"),
"return_on_equity": info.get("returnOnEquity"),
"return_on_assets": info.get("returnOnAssets"),
"debt_to_equity": info.get("debtToEquity"),
"current_ratio": info.get("currentRatio"),
"free_cashflow": info.get("freeCashflow"),
"dividend_yield": info.get("dividendYield"),
}
# Add annual revenue from financials if available
if not financials.empty and "Total Revenue" in financials.index:
revenue_history = financials.loc["Total Revenue"].dropna()
result["annual_revenue_history"] = {
str(date.year): int(val)
for date, val in revenue_history.items()
}
return json.dumps(result, indent=2, default=str)
except Exception as e:
return f"Error fetching financials for {ticker}: {str(e)}"
@tool
def get_analyst_recommendations(ticker: str) -> str:
"""
Get analyst price targets and recommendations for a stock.
Args:
ticker: Stock ticker symbol
Returns:
JSON string with analyst consensus, price target, and recent recommendation changes
"""
try:
stock = yf.Ticker(ticker)
info = stock.info
result = {
"ticker": ticker.upper(),
"recommendation_mean": info.get("recommendationMean"),
"recommendation_key": info.get("recommendationKey"),
"number_of_analyst_opinions": info.get("numberOfAnalystOpinions"),
"target_high_price": info.get("targetHighPrice"),
"target_low_price": info.get("targetLowPrice"),
"target_mean_price": info.get("targetMeanPrice"),
"target_median_price": info.get("targetMedianPrice"),
"current_price": info.get("currentPrice"),
}
# Calculate upside/downside
if result["target_mean_price"] and result["current_price"]:
upside = (
(result["target_mean_price"] - result["current_price"])
/ result["current_price"]
) * 100
result["implied_upside_pct"] = round(upside, 2)
return json.dumps(result, indent=2, default=str)
except Exception as e:
return f"Error fetching recommendations for {ticker}: {str(e)}"
Building the SEC EDGAR Tool
SEC filings contain the official, audited financial data. The EDGAR API is free and requires no API key:
import requests
from langchain_core.tools import tool
@tool
def search_sec_filings(company_name: str, filing_type: str = "10-K") -> str:
"""
Search SEC EDGAR for company filings.
Args:
company_name: Company name or ticker to search for
filing_type: Type of filing - '10-K' (annual), '10-Q' (quarterly), '8-K' (current events)
Returns:
JSON string with recent filing information and links
"""
try:
# Search for the company's CIK number
search_url = (
f"https://efts.sec.gov/LATEST/search-index?"
f"q=%22{company_name}%22&dateRange=custom"
f"&startdt=2024-01-01&forms={filing_type}"
)
headers = {"User-Agent": "FinancialResearchAgent research@example.com"}
# Use the EDGAR full-text search
search_url = (
f"https://efts.sec.gov/LATEST/search-index?"
f"q={requests.utils.quote(company_name)}"
f"&forms={filing_type}&dateRange=custom&startdt=2024-01-01"
)
response = requests.get(
f"https://www.sec.gov/cgi-bin/browse-edgar"
f"?company={requests.utils.quote(company_name)}"
f"&CIK=&type={filing_type}&dateb=&owner=include"
f"&count=5&search_text=&action=getcompany",
headers=headers,
timeout=10
)
if response.status_code != 200:
return f"SEC EDGAR returned status {response.status_code}"
# Parse the response (EDGAR returns HTML for browser requests)
# For a production implementation, use the EDGAR JSON API
result = {
"company": company_name,
"filing_type": filing_type,
"edgar_url": (
f"https://www.sec.gov/cgi-bin/browse-edgar"
f"?company={company_name}&type={filing_type}&action=getcompany"
),
"status": "Search completed",
"note": (
"Visit the EDGAR URL above to access full filing documents. "
"Key sections in 10-K filings: Item 1A (Risk Factors), "
"Item 7 (MD&A), Item 8 (Financial Statements)."
)
}
return json.dumps(result, indent=2)
except Exception as e:
return f"Error searching SEC EDGAR: {str(e)}"
@tool
def calculate_financial_ratios(
revenue: float,
gross_profit: float,
operating_income: float,
net_income: float,
total_assets: float,
total_equity: float,
total_debt: float,
free_cash_flow: float,
market_cap: float
) -> str:
"""
Calculate comprehensive financial ratios from raw financial data.
Args:
revenue: Annual revenue in dollars
gross_profit: Annual gross profit
operating_income: Annual operating income (EBIT)
net_income: Annual net income
total_assets: Total assets from balance sheet
total_equity: Total shareholders equity
total_debt: Total debt (short + long term)
free_cash_flow: Annual free cash flow
market_cap: Current market capitalization
Returns:
JSON string with calculated financial ratios
"""
def safe_divide(a, b, default=None):
return round(a / b, 4) if b and b != 0 else default
ratios = {
"profitability": {
"gross_margin": safe_divide(gross_profit, revenue),
"operating_margin": safe_divide(operating_income, revenue),
"net_margin": safe_divide(net_income, revenue),
"return_on_assets": safe_divide(net_income, total_assets),
"return_on_equity": safe_divide(net_income, total_equity),
},
"leverage": {
"debt_to_equity": safe_divide(total_debt, total_equity),
"debt_to_assets": safe_divide(total_debt, total_assets),
},
"valuation": {
"price_to_earnings": safe_divide(market_cap, net_income),
"price_to_sales": safe_divide(market_cap, revenue),
"price_to_free_cash_flow": safe_divide(market_cap, free_cash_flow),
"ev_approximate": market_cap + total_debt,
},
"efficiency": {
"asset_turnover": safe_divide(revenue, total_assets),
"free_cash_flow_margin": safe_divide(free_cash_flow, revenue),
}
}
return json.dumps(ratios, indent=2)
Assembling the Financial Agent
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [
get_stock_price,
get_company_financials,
get_analyst_recommendations,
search_sec_filings,
calculate_financial_ratios,
]
system_prompt = """You are a professional financial analyst assistant with access to real-time market data.
Your responsibilities:
- Retrieve current stock prices and historical performance
- Analyze financial statements: revenue, margins, growth rates
- Calculate and interpret financial ratios
- Review analyst consensus and price targets
- Search SEC EDGAR for official filings
- Provide balanced, data-driven assessments
Important rules:
- NEVER quote specific financial numbers from memory — always use tools
- Cite the source of every data point
- Include both positive and negative factors in your analysis
- Clarify if data is trailing (historical) or forward-looking (projected)
- Do not provide investment advice — provide analysis only
Format your responses clearly with:
1. Key metrics summary
2. Strengths and concerns
3. Analyst consensus
4. Relevant comparisons when asked
"""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=10,
return_intermediate_steps=True
)
Running the Agent
def analyze_stock(query: str) -> dict:
result = agent_executor.invoke({"input": query})
return {
"answer": result["output"],
"steps": len(result["intermediate_steps"])
}
# Single stock analysis
result = analyze_stock(
"Analyze Apple (AAPL) — give me the P/E ratio, revenue growth, "
"and what analysts think about the stock."
)
print(result["answer"])
print(f"\nAgent took {result['steps']} reasoning steps")
# Comparative analysis
result = analyze_stock(
"Compare Microsoft and Google on revenue growth and profit margins. "
"Which has better fundamentals right now?"
)
print(result["answer"])
# Deep dive with calculations
result = analyze_stock(
"Calculate the financial ratios for Tesla using their latest annual data. "
"Include profitability, leverage, and valuation ratios."
)
print(result["answer"])
Financial Data Source Comparison
| Source | Cost | Coverage | Update Frequency | Best For |
|---|---|---|---|---|
| Yahoo Finance (yfinance) | Free | Global equities, ETFs | 15-min delayed | Research, education |
| Alpha Vantage | Free tier + paid | Equities, forex, crypto | Real-time (paid) | Solo developers |
| Polygon.io | Free tier + paid | US equities, options | Real-time (paid) | Production apps |
| SEC EDGAR | Free | US public companies | As filed | Fundamental analysis |
| Bloomberg Terminal | ~$24,000/yr | Global, all asset classes | Real-time | Institutional |
| Refinitiv Eikon | ~$22,000/yr | Global, alternative data | Real-time | Institutional |
For the AI research agent build pattern, Yahoo Finance plus SEC EDGAR covers most retail and educational use cases at zero cost.
Adding Memory for Multi-Turn Analysis
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
store = {}
def get_session_history(session_id: str) -> ChatMessageHistory:
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
prompt_with_history = ChatPromptTemplate.from_messages([
("system", system_prompt),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent_with_history = create_tool_calling_agent(
llm, tools, prompt_with_history
)
executor_with_history = AgentExecutor(
agent=agent_with_history,
tools=tools,
verbose=False
)
agent_with_memory = RunnableWithMessageHistory(
executor_with_history,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history"
)
# Multi-turn conversation
session = {"configurable": {"session_id": "analyst_session_001"}}
r1 = agent_with_memory.invoke(
{"input": "What is Apple's current P/E ratio?"},
config=session
)
print(r1["output"])
r2 = agent_with_memory.invoke(
{"input": "How does that compare to the S&P 500 average?"},
config=session
)
print(r2["output"])
# The agent remembers Apple's P/E from the previous turn
For the memory architecture powering this, see AI agent memory and planning.
Error Handling and Rate Limiting
Yahoo Finance rate-limits aggressive scrapers. Add retry logic for production:
import time
from functools import wraps
def retry_on_error(max_retries: int = 3, delay: float = 1.0):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
time.sleep(delay * (2 ** attempt))
return None
return wrapper
return decorator
@retry_on_error(max_retries=3, delay=1.0)
def fetch_ticker_data(ticker: str):
stock = yf.Ticker(ticker)
return stock.info
Streaming the Agent's Analysis
For a production web interface, stream the agent's output using async:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain.agents import AgentExecutor, create_tool_calling_agent
import json
app = FastAPI()
@app.post("/analyze")
async def analyze_stream(request: dict):
async def event_stream():
async for event in agent_executor.astream_events(
{"input": request["query"]},
version="v2"
):
if (
event["event"] == "on_chat_model_stream"
and event["data"]["chunk"].content
):
token = event["data"]["chunk"].content
yield f"data: {json.dumps({'token': token})}\n\n"
elif event["event"] == "on_tool_start":
tool_name = event["name"]
msg = f"Calling {tool_name}..."
yield f"data: {json.dumps({'status': msg})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
For a full deployment pattern, see Deploy AI model to production and the OpenAI API integration guide.
Frequently Asked Questions
Is Yahoo Finance data reliable enough for a production financial agent? Yahoo Finance data (via yfinance) is excellent for retail-grade applications: portfolio trackers, personal finance tools, and educational platforms. For production trading systems that handle real money, you should supplement or replace it with a paid data provider like Polygon.io, Alpha Vantage, or Bloomberg Terminal API for guaranteed uptime and data accuracy SLAs.
How do I prevent the LangChain financial agent from hallucinating numbers? Force the agent to always call a tool before answering financial questions. Structure your system prompt to say "Never quote specific numbers from memory — always use tools to retrieve current data." You can also add a validation layer that checks whether numbers in the final answer appear in the tool output, and rejects responses that contain numbers not sourced from a tool call.
Can this agent analyze multiple stocks simultaneously? Yes. You can pass multiple tickers in a single prompt ("Compare Apple, Microsoft, and Google on P/E ratio and revenue growth"), and the agent will call the stock data tools for each ticker in sequence. For parallel analysis of many tickers, use a LangChain agent with a batching wrapper or run multiple agent instances concurrently with asyncio.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
Build a Stock Analysis Agent with AutoGPT (Fundamentals + News)
Build a stock analysis AutoGPT agent that fetches fundamentals, summarizes financial news, and generates structured investment research reports automatically.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.