Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment
Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment
Building my first AI chatbot took about two hours. Building one that actually worked well in production took two months of iteration — conversation memory, streaming, rate limiting, error handling, cost controls.
This guide covers the full path: from the minimal 30-line chatbot that actually works, to the production system that handles real users. By the end, you'll have a working chatbot with conversation memory, streaming responses, custom knowledge, and a web interface.
Part 1: The Minimal Working Chatbot
# pip install openai
from openai import OpenAI
client = OpenAI() # Uses OPENAI_API_KEY environment variable
def chatbot():
"""Minimal chatbot with conversation memory."""
messages = [
{
"role": "system",
"content": "You are a helpful assistant. Be concise and clear."
}
]
print("Chatbot ready. Type 'quit' to exit.\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ["quit", "exit", "bye"]:
print("Goodbye!")
break
if not user_input:
continue
# Add user message to history
messages.append({"role": "user", "content": user_input})
# Call API with full conversation history
response = client.chat.completions.create(
model="gpt-4o-mini", # Fast and cheap for development
messages=messages,
temperature=0.7,
max_tokens=500
)
assistant_message = response.choices[0].message.content
# Add response to history (this IS the memory)
messages.append({"role": "assistant", "content": assistant_message})
print(f"\nAssistant: {assistant_message}\n")
if __name__ == "__main__":
chatbot()
This 35-line script has real memory — each message in messages is the entire context the model sees.
Part 2: Streaming Responses
Streaming makes your chatbot feel much faster:
def streaming_chatbot():
messages = [{"role": "system", "content": "You are a helpful assistant."}]
print("Streaming chatbot. Type 'quit' to exit.\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() == "quit":
break
messages.append({"role": "user", "content": user_input})
print("\nAssistant: ", end="", flush=True)
# Stream the response
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True # Key parameter
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
full_response += content
print("\n") # Newline after response
messages.append({"role": "assistant", "content": full_response})
Part 3: Context Management
Conversations eventually exceed context limits. Handle this gracefully:
import tiktoken
class ChatbotWithMemory:
def __init__(
self,
model: str = "gpt-4o-mini",
system_prompt: str = "You are a helpful assistant.",
max_tokens: int = 8000 # Leave room for response
):
self.model = model
self.max_tokens = max_tokens
self.enc = tiktoken.encoding_for_model("gpt-4o")
self.messages = [{"role": "system", "content": system_prompt}]
self.system_tokens = len(self.enc.encode(system_prompt))
def count_tokens(self) -> int:
total = self.system_tokens
for msg in self.messages[1:]: # Skip system message
total += len(self.enc.encode(msg["content"])) + 4 # Overhead per message
return total
def trim_history(self):
"""Remove oldest messages when approaching limit."""
while self.count_tokens() > self.max_tokens and len(self.messages) > 2:
# Keep system message + latest exchange, remove second oldest user message
if len(self.messages) > 3:
self.messages.pop(1) # Remove oldest non-system message
else:
break
def chat(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
self.trim_history()
response = client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=500
)
assistant_message = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
def get_stats(self) -> dict:
return {
"message_count": len(self.messages),
"token_count": self.count_tokens(),
"token_limit": self.max_tokens
}
# Usage
bot = ChatbotWithMemory(
system_prompt="You are a Python tutor. Explain concepts clearly with code examples."
)
print(bot.chat("What is a decorator in Python?"))
print(bot.chat("Can you show me a practical example?"))
print(bot.chat("How does it differ from a class-based decorator?"))
print(f"Stats: {bot.get_stats()}")
Part 4: FastAPI Web Backend
# pip install fastapi uvicorn redis
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional
import json
import uuid
app = FastAPI(title="AI Chatbot API")
# In-memory session storage (use Redis for production)
sessions: dict = {}
class ChatRequest(BaseModel):
message: str
session_id: Optional[str] = None
class ChatResponse(BaseModel):
response: str
session_id: str
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
session_id = request.session_id or str(uuid.uuid4())
if session_id not in sessions:
sessions[session_id] = [
{"role": "system", "content": "You are a helpful assistant."}
]
sessions[session_id].append({"role": "user", "content": request.message})
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=sessions[session_id],
max_tokens=500
)
assistant_message = response.choices[0].message.content
sessions[session_id].append({"role": "assistant", "content": assistant_message})
return ChatResponse(response=assistant_message, session_id=session_id)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
session_id = request.session_id or str(uuid.uuid4())
if session_id not in sessions:
sessions[session_id] = [
{"role": "system", "content": "You are a helpful assistant."}
]
sessions[session_id].append({"role": "user", "content": request.message})
async def generate():
full_response = ""
# Send session_id first
yield f"data: {json.dumps({'session_id': session_id})}\n\n"
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=sessions[session_id],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
full_response += content
yield f"data: {json.dumps({'content': content})}\n\n"
sessions[session_id].append({"role": "assistant", "content": full_response})
yield f"data: {json.dumps({'done': True})}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
@app.delete("/chat/{session_id}")
async def clear_session(session_id: str):
if session_id in sessions:
del sessions[session_id]
return {"message": "Session cleared"}
# Run: uvicorn chatbot_api:app --reload
Part 5: Simple Streamlit UI
# pip install streamlit
# Run: streamlit run chatbot_ui.py
import streamlit as st
from openai import OpenAI
client = OpenAI()
st.title("AI Chatbot")
st.caption("Powered by GPT-4o mini")
# System prompt customization
with st.sidebar:
st.header("Settings")
system_prompt = st.text_area(
"System Prompt",
value="You are a helpful assistant. Be concise and clear.",
height=100
)
model = st.selectbox("Model", ["gpt-4o-mini", "gpt-4o", "gpt-3.5-turbo"])
temperature = st.slider("Temperature", 0.0, 2.0, 0.7, 0.1)
if st.button("Clear Conversation"):
st.session_state.messages = []
st.rerun()
# Initialize conversation state
if "messages" not in st.session_state:
st.session_state.messages = []
# Display conversation history
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.write(msg["content"])
# Chat input
if prompt := st.chat_input("Type your message..."):
# Add user message
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.write(prompt)
# Build messages for API call
api_messages = [{"role": "system", "content": system_prompt}]
api_messages.extend(st.session_state.messages)
# Stream response
with st.chat_message("assistant"):
response_placeholder = st.empty()
full_response = ""
stream = client.chat.completions.create(
model=model,
messages=api_messages,
temperature=temperature,
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content or ""
full_response += content
response_placeholder.write(full_response + "▌")
response_placeholder.write(full_response)
st.session_state.messages.append({"role": "assistant", "content": full_response})
Part 6: Adding Document Knowledge (RAG)
# pip install chromadb sentence-transformers langchain-community
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
class RAGChatbot:
def __init__(self, documents_dir: str = "./docs"):
self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
self.vectorstore = None
self._load_documents(documents_dir)
self.messages = [{"role": "system", "content": "Answer based on the provided context."}]
def _load_documents(self, docs_dir: str):
from langchain_community.document_loaders import DirectoryLoader, TextLoader
loader = DirectoryLoader(docs_dir, glob="**/*.txt", loader_cls=TextLoader)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
self.vectorstore = Chroma.from_documents(chunks, self.embeddings)
print(f"Loaded {len(chunks)} document chunks")
def chat(self, question: str) -> str:
# Retrieve relevant context
relevant_docs = self.vectorstore.similarity_search(question, k=3)
context = "\n\n".join([doc.page_content for doc in relevant_docs])
# Build message with context
augmented_message = f"Context:\n{context}\n\nQuestion: {question}"
self.messages.append({"role": "user", "content": augmented_message})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=self.messages
)
answer = response.choices[0].message.content
# Store clean question in history (not the augmented one)
self.messages[-1] = {"role": "user", "content": question}
self.messages.append({"role": "assistant", "content": answer})
return answer
bot = RAGChatbot("./company_docs")
print(bot.chat("What is our return policy?"))
Conclusion
A functional AI chatbot is genuinely 30 lines of Python. A production-ready one requires conversation management, streaming, rate limiting, persistent storage, and a proper web interface — but each component is approachable.
Start with the minimal version, add features as you need them, and resist the temptation to over-engineer from day one. Most chatbots in production started exactly this way.
For building more complex AI applications, see our LangChain tutorial. For deploying to production with proper infrastructure, see our deploy AI model guide.
Frequently Asked Questions
What do I need to build an AI chatbot with Python?
Python 3.9+, an API key (OpenAI, Anthropic, or Google), and the respective Python SDK. The core is ~30 lines: maintain a messages list, append user/assistant turns, send the list to the API each time. Advanced features (streaming, RAG, web interface) build on this foundation.
How do I add memory to my AI chatbot?
Memory is the messages list — the entire conversation history sent with each API call. Short-term: append messages to a list. Long-term: store in a database and load on session start. When conversations exceed the context window, either trim oldest messages or summarize them.
How do I stream chatbot responses in real-time?
Use stream=True in the API call. Iterate over the returned stream, printing each chunk as it arrives. For web apps, use Server-Sent Events or WebSockets via FastAPI's StreamingResponse. This makes responses feel much faster since text appears immediately.
How do I add knowledge from my own documents?
RAG (Retrieval-Augmented Generation): embed your documents, store in a vector database, retrieve relevant chunks when users ask questions, include chunks in the prompt. LangChain's ConversationalRetrievalChain handles this automatically. See our RAG guide for full implementation.
How do I deploy my Python chatbot to production?
Streamlit Community Cloud is easiest for prototypes. Railway or Render for simple PaaS deployment. For production: FastAPI backend, Redis for sessions, proper database for history, Docker for containerization, environment variables for API keys.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality
AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.
Build a Personal AI Assistant: Complete Python Project with Memory and Tools
Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.
CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together
CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.
Deploy AI Model to Production: FastAPI, Docker, and Cloud Deployment Guide
Deploy AI model to production — complete guide using FastAPI, Docker, and cloud platforms with monitoring, scaling, CI/CD, and best practices for production ML systems.