How do I add memory to my AI chatbot?

Chatbot memory works by maintaining a messages list and sending the full conversation history with each API call. Short-term memory: append every user message and assistant response to the list. Long-term memory: store conversations in a database (SQLite, PostgreSQL) and load recent history on session start. When conversations get too long (exceed context window), summarize older messages. For cross-session memory, use embeddings + vector search to retrieve relevant past conversations. The challenge is balancing context length (cost/performance) with conversation continuity.

How do I stream chatbot responses in real-time?

Use the stream=True parameter in the OpenAI API call. Instead of waiting for the full response, you receive a stream of chunks — each containing a small piece of the response. Iterate over the stream with a for loop, printing each chunk as it arrives. For web apps, use Server-Sent Events (SSE) or WebSockets to stream chunks to the browser in real-time. Flask-SSE or FastAPI's StreamingResponse work well. This dramatically improves perceived response time since users see text appearing immediately rather than waiting for the full response.

How do I add knowledge from my own documents to a chatbot?

Use Retrieval-Augmented Generation (RAG): 1) Embed your documents and store in a vector database (Chroma, Pinecone). 2) When a user asks a question, embed the question and retrieve similar document chunks. 3) Include the retrieved chunks in the system prompt as context. 4) The chatbot answers based on your documents. This gives the chatbot knowledge of your specific data without fine-tuning. For simple cases, LangChain's ConversationalRetrievalChain handles the entire RAG+chat pipeline. See our RAG guide for the full implementation.

How do I deploy my Python chatbot to production?

Deployment options by complexity: 1) Streamlit Community Cloud — easiest, deploy from GitHub, free tier available, great for prototypes. 2) Railway or Render — simple PaaS, Docker or git deploy, ~$5-20/month. 3) AWS/GCP/Azure — full control, scalable, more complex setup. For production chatbots: use FastAPI for the backend API, Redis for session storage, a proper database for conversation history, and implement rate limiting to control API costs. Containerize with Docker for consistent deployment. Environment variables for API keys (never hardcode). See our deployment guide for the complete production setup.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

AI application development code in Python editor — build an ai chatbot with python build ai chatbot python

Ai Development

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

⚡ Quick Answer

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

AiTechWorlds Team May 27, 2026 7 min read

#build-ai-chatbot-python #python-chatbot-tutorial #openai-chatbot #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Building my first AI chatbot took about two hours. Building one that actually worked well in production took two months of iteration — conversation memory, streaming, rate limiting, error handling, cost controls.

This guide covers the full path: from the minimal 30-line chatbot that actually works, to the production system that handles real users. By the end, you'll have a working chatbot with conversation memory, streaming responses, custom knowledge, and a web interface.

Part 1: The Minimal Working Chatbot

# pip install openai

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY environment variable

def chatbot():
    """Minimal chatbot with conversation memory."""
    
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant. Be concise and clear."
        }
    ]
    
    print("Chatbot ready. Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Goodbye!")
            break
        
        if not user_input:
            continue
        
        # Add user message to history
        messages.append({"role": "user", "content": user_input})
        
        # Call API with full conversation history
        response = client.chat.completions.create(
            model="gpt-4o-mini",      # Fast and cheap for development
            messages=messages,
            temperature=0.7,
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        
        # Add response to history (this IS the memory)
        messages.append({"role": "assistant", "content": assistant_message})
        
        print(f"\nAssistant: {assistant_message}\n")

if __name__ == "__main__":
    chatbot()

This 35-line script has real memory — each message in messages is the entire context the model sees.

Part 2: Streaming Responses

Streaming makes your chatbot feel much faster:

def streaming_chatbot():
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    
    print("Streaming chatbot. Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        
        messages.append({"role": "user", "content": user_input})
        
        print("\nAssistant: ", end="", flush=True)
        
        # Stream the response
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            stream=True         # Key parameter
        )
        
        full_response = ""
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        
        print("\n")  # Newline after response
        messages.append({"role": "assistant", "content": full_response})

Part 3: Context Management

Conversations eventually exceed context limits. Handle this gracefully:

import tiktoken

class ChatbotWithMemory:
    def __init__(
        self,
        model: str = "gpt-4o-mini",
        system_prompt: str = "You are a helpful assistant.",
        max_tokens: int = 8000  # Leave room for response
    ):
        self.model = model
        self.max_tokens = max_tokens
        self.enc = tiktoken.encoding_for_model("gpt-4o")
        self.messages = [{"role": "system", "content": system_prompt}]
        self.system_tokens = len(self.enc.encode(system_prompt))
    
    def count_tokens(self) -> int:
        total = self.system_tokens
        for msg in self.messages[1:]:  # Skip system message
            total += len(self.enc.encode(msg["content"])) + 4  # Overhead per message
        return total
    
    def trim_history(self):
        """Remove oldest messages when approaching limit."""
        while self.count_tokens() > self.max_tokens and len(self.messages) > 2:
            # Keep system message + latest exchange, remove second oldest user message
            if len(self.messages) > 3:
                self.messages.pop(1)  # Remove oldest non-system message
            else:
                break
    
    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})
        self.trim_history()
        
        response = client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=0.7,
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message
    
    def get_stats(self) -> dict:
        return {
            "message_count": len(self.messages),
            "token_count": self.count_tokens(),
            "token_limit": self.max_tokens
        }

# Usage
bot = ChatbotWithMemory(
    system_prompt="You are a Python tutor. Explain concepts clearly with code examples."
)

print(bot.chat("What is a decorator in Python?"))
print(bot.chat("Can you show me a practical example?"))
print(bot.chat("How does it differ from a class-based decorator?"))
print(f"Stats: {bot.get_stats()}")

Part 4: FastAPI Web Backend

# pip install fastapi uvicorn redis

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional
import json
import uuid

app = FastAPI(title="AI Chatbot API")

# In-memory session storage (use Redis for production)
sessions: dict = {}

class ChatRequest(BaseModel):
    message: str
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    response: str
    session_id: str

@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    session_id = request.session_id or str(uuid.uuid4())
    
    if session_id not in sessions:
        sessions[session_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    
    sessions[session_id].append({"role": "user", "content": request.message})
    
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=sessions[session_id],
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        sessions[session_id].append({"role": "assistant", "content": assistant_message})
        
        return ChatResponse(response=assistant_message, session_id=session_id)
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    session_id = request.session_id or str(uuid.uuid4())
    
    if session_id not in sessions:
        sessions[session_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    
    sessions[session_id].append({"role": "user", "content": request.message})
    
    async def generate():
        full_response = ""
        
        # Send session_id first
        yield f"data: {json.dumps({'session_id': session_id})}\n\n"
        
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=sessions[session_id],
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                full_response += content
                yield f"data: {json.dumps({'content': content})}\n\n"
        
        sessions[session_id].append({"role": "assistant", "content": full_response})
        yield f"data: {json.dumps({'done': True})}\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

@app.delete("/chat/{session_id}")
async def clear_session(session_id: str):
    if session_id in sessions:
        del sessions[session_id]
    return {"message": "Session cleared"}

# Run: uvicorn chatbot_api:app --reload

Part 5: Simple Streamlit UI

# pip install streamlit
# Run: streamlit run chatbot_ui.py

import streamlit as st
from openai import OpenAI

client = OpenAI()

st.title("AI Chatbot")
st.caption("Powered by GPT-4o mini")

# System prompt customization
with st.sidebar:
    st.header("Settings")
    system_prompt = st.text_area(
        "System Prompt",
        value="You are a helpful assistant. Be concise and clear.",
        height=100
    )
    model = st.selectbox("Model", ["gpt-4o-mini", "gpt-4o", "gpt-3.5-turbo"])
    temperature = st.slider("Temperature", 0.0, 2.0, 0.7, 0.1)
    
    if st.button("Clear Conversation"):
        st.session_state.messages = []
        st.rerun()

# Initialize conversation state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display conversation history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.write(msg["content"])

# Chat input
if prompt := st.chat_input("Type your message..."):
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)
    
    # Build messages for API call
    api_messages = [{"role": "system", "content": system_prompt}]
    api_messages.extend(st.session_state.messages)
    
    # Stream response
    with st.chat_message("assistant"):
        response_placeholder = st.empty()
        full_response = ""
        
        stream = client.chat.completions.create(
            model=model,
            messages=api_messages,
            temperature=temperature,
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content or ""
            full_response += content
            response_placeholder.write(full_response + "▌")
        
        response_placeholder.write(full_response)
    
    st.session_state.messages.append({"role": "assistant", "content": full_response})

Part 6: Adding Document Knowledge (RAG)

# pip install chromadb sentence-transformers langchain-community

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGChatbot:
    def __init__(self, documents_dir: str = "./docs"):
        self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
        self.vectorstore = None
        self._load_documents(documents_dir)
        self.messages = [{"role": "system", "content": "Answer based on the provided context."}]
    
    def _load_documents(self, docs_dir: str):
        from langchain_community.document_loaders import DirectoryLoader, TextLoader
        
        loader = DirectoryLoader(docs_dir, glob="**/*.txt", loader_cls=TextLoader)
        docs = loader.load()
        
        splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
        chunks = splitter.split_documents(docs)
        
        self.vectorstore = Chroma.from_documents(chunks, self.embeddings)
        print(f"Loaded {len(chunks)} document chunks")
    
    def chat(self, question: str) -> str:
        # Retrieve relevant context
        relevant_docs = self.vectorstore.similarity_search(question, k=3)
        context = "\n\n".join([doc.page_content for doc in relevant_docs])
        
        # Build message with context
        augmented_message = f"Context:\n{context}\n\nQuestion: {question}"
        self.messages.append({"role": "user", "content": augmented_message})
        
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=self.messages
        )
        
        answer = response.choices[0].message.content
        # Store clean question in history (not the augmented one)
        self.messages[-1] = {"role": "user", "content": question}
        self.messages.append({"role": "assistant", "content": answer})
        
        return answer

bot = RAGChatbot("./company_docs")
print(bot.chat("What is our return policy?"))

Conclusion

A functional AI chatbot is genuinely 30 lines of Python. A production-ready one requires conversation management, streaming, rate limiting, persistent storage, and a proper web interface — but each component is approachable.

Start with the minimal version, add features as you need them, and resist the temptation to over-engineer from day one. Most chatbots in production started exactly this way.

For building more complex AI applications, see our LangChain tutorial. For deploying to production with proper infrastructure, see our deploy AI model guide.

Frequently Asked Questions

To build a basic AI chatbot with Python you need: Python 3.9+, an OpenAI API key (or Anthropic/Google for Claude/Gemini), the openai Python package, and optionally Flask or FastAPI for a web interface. The core is simple — maintain a messages list, append user messages and model responses, send the full list to the API on each turn. More advanced features (persistent memory, RAG, streaming) add on top of this foundation. The entire minimal working chatbot is about 30 lines of Python.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI application development code in Python editor — ai api cost management

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI application development code in Python editor — build a personal ai assistant build personal ai assistant

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI application development code in Python editor — crewai tutorial

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

AI application development code in Python editor — deploy ai model to production deploy ai model production

AI Learning

Deploy AI Model to Production: FastAPI, Docker, and Cloud Deployment Guide

Deploy AI model to production — complete guide using FastAPI, Docker, and cloud platforms with monitoring, scaling, CI/CD, and best practices for production ML systems.

May 27, 2026 6 min read

Go deeper on this topic

NotesPrompt Engineering Cheat Sheet NotesLLM Core Concepts Explained NotesChatGPT Tips & Tricks Cheat Sheet NotesAI Agent Development Notes NotesTransformer Architecture Cheat Sheet NotesPrompt Engineering vs Fine-Tuning vs RLHF

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Ai Development

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

⚡ Quick Answer

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

AiTechWorlds Team May 27, 2026 7 min read

#build-ai-chatbot-python #python-chatbot-tutorial #openai-chatbot #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Part 1: The Minimal Working Chatbot

# pip install openai

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY environment variable

def chatbot():
    """Minimal chatbot with conversation memory."""
    
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant. Be concise and clear."
        }
    ]
    
    print("Chatbot ready. Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Goodbye!")
            break
        
        if not user_input:
            continue
        
        # Add user message to history
        messages.append({"role": "user", "content": user_input})
        
        # Call API with full conversation history
        response = client.chat.completions.create(
            model="gpt-4o-mini",      # Fast and cheap for development
            messages=messages,
            temperature=0.7,
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        
        # Add response to history (this IS the memory)
        messages.append({"role": "assistant", "content": assistant_message})
        
        print(f"\nAssistant: {assistant_message}\n")

if __name__ == "__main__":
    chatbot()

This 35-line script has real memory — each message in messages is the entire context the model sees.

Part 2: Streaming Responses

Streaming makes your chatbot feel much faster:

def streaming_chatbot():
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    
    print("Streaming chatbot. Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        
        messages.append({"role": "user", "content": user_input})
        
        print("\nAssistant: ", end="", flush=True)
        
        # Stream the response
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            stream=True         # Key parameter
        )
        
        full_response = ""
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        
        print("\n")  # Newline after response
        messages.append({"role": "assistant", "content": full_response})

Part 3: Context Management

Conversations eventually exceed context limits. Handle this gracefully:

import tiktoken

class ChatbotWithMemory:
    def __init__(
        self,
        model: str = "gpt-4o-mini",
        system_prompt: str = "You are a helpful assistant.",
        max_tokens: int = 8000  # Leave room for response
    ):
        self.model = model
        self.max_tokens = max_tokens
        self.enc = tiktoken.encoding_for_model("gpt-4o")
        self.messages = [{"role": "system", "content": system_prompt}]
        self.system_tokens = len(self.enc.encode(system_prompt))
    
    def count_tokens(self) -> int:
        total = self.system_tokens
        for msg in self.messages[1:]:  # Skip system message
            total += len(self.enc.encode(msg["content"])) + 4  # Overhead per message
        return total
    
    def trim_history(self):
        """Remove oldest messages when approaching limit."""
        while self.count_tokens() > self.max_tokens and len(self.messages) > 2:
            # Keep system message + latest exchange, remove second oldest user message
            if len(self.messages) > 3:
                self.messages.pop(1)  # Remove oldest non-system message
            else:
                break
    
    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})
        self.trim_history()
        
        response = client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=0.7,
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message
    
    def get_stats(self) -> dict:
        return {
            "message_count": len(self.messages),
            "token_count": self.count_tokens(),
            "token_limit": self.max_tokens
        }

# Usage
bot = ChatbotWithMemory(
    system_prompt="You are a Python tutor. Explain concepts clearly with code examples."
)

print(bot.chat("What is a decorator in Python?"))
print(bot.chat("Can you show me a practical example?"))
print(bot.chat("How does it differ from a class-based decorator?"))
print(f"Stats: {bot.get_stats()}")

Part 4: FastAPI Web Backend

# pip install fastapi uvicorn redis

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional
import json
import uuid

app = FastAPI(title="AI Chatbot API")

# In-memory session storage (use Redis for production)
sessions: dict = {}

class ChatRequest(BaseModel):
    message: str
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    response: str
    session_id: str

@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    session_id = request.session_id or str(uuid.uuid4())
    
    if session_id not in sessions:
        sessions[session_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    
    sessions[session_id].append({"role": "user", "content": request.message})
    
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=sessions[session_id],
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        sessions[session_id].append({"role": "assistant", "content": assistant_message})
        
        return ChatResponse(response=assistant_message, session_id=session_id)
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    session_id = request.session_id or str(uuid.uuid4())
    
    if session_id not in sessions:
        sessions[session_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    
    sessions[session_id].append({"role": "user", "content": request.message})
    
    async def generate():
        full_response = ""
        
        # Send session_id first
        yield f"data: {json.dumps({'session_id': session_id})}\n\n"
        
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=sessions[session_id],
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                full_response += content
                yield f"data: {json.dumps({'content': content})}\n\n"
        
        sessions[session_id].append({"role": "assistant", "content": full_response})
        yield f"data: {json.dumps({'done': True})}\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

@app.delete("/chat/{session_id}")
async def clear_session(session_id: str):
    if session_id in sessions:
        del sessions[session_id]
    return {"message": "Session cleared"}

# Run: uvicorn chatbot_api:app --reload

Part 5: Simple Streamlit UI

# pip install streamlit
# Run: streamlit run chatbot_ui.py

import streamlit as st
from openai import OpenAI

client = OpenAI()

st.title("AI Chatbot")
st.caption("Powered by GPT-4o mini")

# System prompt customization
with st.sidebar:
    st.header("Settings")
    system_prompt = st.text_area(
        "System Prompt",
        value="You are a helpful assistant. Be concise and clear.",
        height=100
    )
    model = st.selectbox("Model", ["gpt-4o-mini", "gpt-4o", "gpt-3.5-turbo"])
    temperature = st.slider("Temperature", 0.0, 2.0, 0.7, 0.1)
    
    if st.button("Clear Conversation"):
        st.session_state.messages = []
        st.rerun()

# Initialize conversation state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display conversation history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.write(msg["content"])

# Chat input
if prompt := st.chat_input("Type your message..."):
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)
    
    # Build messages for API call
    api_messages = [{"role": "system", "content": system_prompt}]
    api_messages.extend(st.session_state.messages)
    
    # Stream response
    with st.chat_message("assistant"):
        response_placeholder = st.empty()
        full_response = ""
        
        stream = client.chat.completions.create(
            model=model,
            messages=api_messages,
            temperature=temperature,
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content or ""
            full_response += content
            response_placeholder.write(full_response + "▌")
        
        response_placeholder.write(full_response)
    
    st.session_state.messages.append({"role": "assistant", "content": full_response})

Part 6: Adding Document Knowledge (RAG)

# pip install chromadb sentence-transformers langchain-community

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGChatbot:
    def __init__(self, documents_dir: str = "./docs"):
        self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
        self.vectorstore = None
        self._load_documents(documents_dir)
        self.messages = [{"role": "system", "content": "Answer based on the provided context."}]
    
    def _load_documents(self, docs_dir: str):
        from langchain_community.document_loaders import DirectoryLoader, TextLoader
        
        loader = DirectoryLoader(docs_dir, glob="**/*.txt", loader_cls=TextLoader)
        docs = loader.load()
        
        splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
        chunks = splitter.split_documents(docs)
        
        self.vectorstore = Chroma.from_documents(chunks, self.embeddings)
        print(f"Loaded {len(chunks)} document chunks")
    
    def chat(self, question: str) -> str:
        # Retrieve relevant context
        relevant_docs = self.vectorstore.similarity_search(question, k=3)
        context = "\n\n".join([doc.page_content for doc in relevant_docs])
        
        # Build message with context
        augmented_message = f"Context:\n{context}\n\nQuestion: {question}"
        self.messages.append({"role": "user", "content": augmented_message})
        
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=self.messages
        )
        
        answer = response.choices[0].message.content
        # Store clean question in history (not the augmented one)
        self.messages[-1] = {"role": "user", "content": question}
        self.messages.append({"role": "assistant", "content": answer})
        
        return answer

bot = RAGChatbot("./company_docs")
print(bot.chat("What is our return policy?"))

Conclusion

Start with the minimal version, add features as you need them, and resist the temptation to over-engineer from day one. Most chatbots in production started exactly this way.

For building more complex AI applications, see our LangChain tutorial. For deploying to production with proper infrastructure, see our deploy AI model guide.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

AI Learning

Deploy AI Model to Production: FastAPI, Docker, and Cloud Deployment Guide

Deploy AI model to production — complete guide using FastAPI, Docker, and cloud platforms with monitoring, scaling, CI/CD, and best practices for production ML systems.

May 27, 2026 6 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Part 1: The Minimal Working Chatbot

Part 2: Streaming Responses

Part 3: Context Management

Part 4: FastAPI Web Backend

Part 5: Simple Streamlit UI

Part 6: Adding Document Knowledge (RAG)

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Deploy AI Model to Production: FastAPI, Docker, and Cloud Deployment Guide

Go deeper on this topic

Get Free AI Notes Daily

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Part 1: The Minimal Working Chatbot

Part 2: Streaming Responses

Part 3: Context Management

Part 4: FastAPI Web Backend

Part 5: Simple Streamlit UI

Part 6: Adding Document Knowledge (RAG)

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Deploy AI Model to Production: FastAPI, Docker, and Cloud Deployment Guide

Go deeper on this topic

Get Free AI Notes Daily