Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

A
AiTechWorlds Team
May 27, 2026 8 min read
📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Building my first AI chatbot took about two hours. Building one that actually worked well in production took two months of iteration — conversation memory, streaming, rate limiting, error handling, cost controls.

This guide covers the full path: from the minimal 30-line chatbot that actually works, to the production system that handles real users. By the end, you'll have a working chatbot with conversation memory, streaming responses, custom knowledge, and a web interface.


Part 1: The Minimal Working Chatbot

# pip install openai

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY environment variable

def chatbot():
    """Minimal chatbot with conversation memory."""
    
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant. Be concise and clear."
        }
    ]
    
    print("Chatbot ready. Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Goodbye!")
            break
        
        if not user_input:
            continue
        
        # Add user message to history
        messages.append({"role": "user", "content": user_input})
        
        # Call API with full conversation history
        response = client.chat.completions.create(
            model="gpt-4o-mini",      # Fast and cheap for development
            messages=messages,
            temperature=0.7,
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        
        # Add response to history (this IS the memory)
        messages.append({"role": "assistant", "content": assistant_message})
        
        print(f"\nAssistant: {assistant_message}\n")

if __name__ == "__main__":
    chatbot()

This 35-line script has real memory — each message in messages is the entire context the model sees.


Part 2: Streaming Responses

Streaming makes your chatbot feel much faster:

def streaming_chatbot():
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    
    print("Streaming chatbot. Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "quit":
            break
        
        messages.append({"role": "user", "content": user_input})
        
        print("\nAssistant: ", end="", flush=True)
        
        # Stream the response
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            stream=True         # Key parameter
        )
        
        full_response = ""
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                print(content, end="", flush=True)
                full_response += content
        
        print("\n")  # Newline after response
        messages.append({"role": "assistant", "content": full_response})

Part 3: Context Management

Conversations eventually exceed context limits. Handle this gracefully:

import tiktoken

class ChatbotWithMemory:
    def __init__(
        self,
        model: str = "gpt-4o-mini",
        system_prompt: str = "You are a helpful assistant.",
        max_tokens: int = 8000  # Leave room for response
    ):
        self.model = model
        self.max_tokens = max_tokens
        self.enc = tiktoken.encoding_for_model("gpt-4o")
        self.messages = [{"role": "system", "content": system_prompt}]
        self.system_tokens = len(self.enc.encode(system_prompt))
    
    def count_tokens(self) -> int:
        total = self.system_tokens
        for msg in self.messages[1:]:  # Skip system message
            total += len(self.enc.encode(msg["content"])) + 4  # Overhead per message
        return total
    
    def trim_history(self):
        """Remove oldest messages when approaching limit."""
        while self.count_tokens() > self.max_tokens and len(self.messages) > 2:
            # Keep system message + latest exchange, remove second oldest user message
            if len(self.messages) > 3:
                self.messages.pop(1)  # Remove oldest non-system message
            else:
                break
    
    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})
        self.trim_history()
        
        response = client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=0.7,
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message
    
    def get_stats(self) -> dict:
        return {
            "message_count": len(self.messages),
            "token_count": self.count_tokens(),
            "token_limit": self.max_tokens
        }

# Usage
bot = ChatbotWithMemory(
    system_prompt="You are a Python tutor. Explain concepts clearly with code examples."
)

print(bot.chat("What is a decorator in Python?"))
print(bot.chat("Can you show me a practical example?"))
print(bot.chat("How does it differ from a class-based decorator?"))
print(f"Stats: {bot.get_stats()}")

Part 4: FastAPI Web Backend

# pip install fastapi uvicorn redis

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional
import json
import uuid

app = FastAPI(title="AI Chatbot API")

# In-memory session storage (use Redis for production)
sessions: dict = {}

class ChatRequest(BaseModel):
    message: str
    session_id: Optional[str] = None

class ChatResponse(BaseModel):
    response: str
    session_id: str

@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    session_id = request.session_id or str(uuid.uuid4())
    
    if session_id not in sessions:
        sessions[session_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    
    sessions[session_id].append({"role": "user", "content": request.message})
    
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=sessions[session_id],
            max_tokens=500
        )
        
        assistant_message = response.choices[0].message.content
        sessions[session_id].append({"role": "assistant", "content": assistant_message})
        
        return ChatResponse(response=assistant_message, session_id=session_id)
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    session_id = request.session_id or str(uuid.uuid4())
    
    if session_id not in sessions:
        sessions[session_id] = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
    
    sessions[session_id].append({"role": "user", "content": request.message})
    
    async def generate():
        full_response = ""
        
        # Send session_id first
        yield f"data: {json.dumps({'session_id': session_id})}\n\n"
        
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=sessions[session_id],
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                full_response += content
                yield f"data: {json.dumps({'content': content})}\n\n"
        
        sessions[session_id].append({"role": "assistant", "content": full_response})
        yield f"data: {json.dumps({'done': True})}\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

@app.delete("/chat/{session_id}")
async def clear_session(session_id: str):
    if session_id in sessions:
        del sessions[session_id]
    return {"message": "Session cleared"}

# Run: uvicorn chatbot_api:app --reload

Part 5: Simple Streamlit UI

# pip install streamlit
# Run: streamlit run chatbot_ui.py

import streamlit as st
from openai import OpenAI

client = OpenAI()

st.title("AI Chatbot")
st.caption("Powered by GPT-4o mini")

# System prompt customization
with st.sidebar:
    st.header("Settings")
    system_prompt = st.text_area(
        "System Prompt",
        value="You are a helpful assistant. Be concise and clear.",
        height=100
    )
    model = st.selectbox("Model", ["gpt-4o-mini", "gpt-4o", "gpt-3.5-turbo"])
    temperature = st.slider("Temperature", 0.0, 2.0, 0.7, 0.1)
    
    if st.button("Clear Conversation"):
        st.session_state.messages = []
        st.rerun()

# Initialize conversation state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display conversation history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.write(msg["content"])

# Chat input
if prompt := st.chat_input("Type your message..."):
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)
    
    # Build messages for API call
    api_messages = [{"role": "system", "content": system_prompt}]
    api_messages.extend(st.session_state.messages)
    
    # Stream response
    with st.chat_message("assistant"):
        response_placeholder = st.empty()
        full_response = ""
        
        stream = client.chat.completions.create(
            model=model,
            messages=api_messages,
            temperature=temperature,
            stream=True
        )
        
        for chunk in stream:
            content = chunk.choices[0].delta.content or ""
            full_response += content
            response_placeholder.write(full_response + "▌")
        
        response_placeholder.write(full_response)
    
    st.session_state.messages.append({"role": "assistant", "content": full_response})

Part 6: Adding Document Knowledge (RAG)

# pip install chromadb sentence-transformers langchain-community

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGChatbot:
    def __init__(self, documents_dir: str = "./docs"):
        self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
        self.vectorstore = None
        self._load_documents(documents_dir)
        self.messages = [{"role": "system", "content": "Answer based on the provided context."}]
    
    def _load_documents(self, docs_dir: str):
        from langchain_community.document_loaders import DirectoryLoader, TextLoader
        
        loader = DirectoryLoader(docs_dir, glob="**/*.txt", loader_cls=TextLoader)
        docs = loader.load()
        
        splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
        chunks = splitter.split_documents(docs)
        
        self.vectorstore = Chroma.from_documents(chunks, self.embeddings)
        print(f"Loaded {len(chunks)} document chunks")
    
    def chat(self, question: str) -> str:
        # Retrieve relevant context
        relevant_docs = self.vectorstore.similarity_search(question, k=3)
        context = "\n\n".join([doc.page_content for doc in relevant_docs])
        
        # Build message with context
        augmented_message = f"Context:\n{context}\n\nQuestion: {question}"
        self.messages.append({"role": "user", "content": augmented_message})
        
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=self.messages
        )
        
        answer = response.choices[0].message.content
        # Store clean question in history (not the augmented one)
        self.messages[-1] = {"role": "user", "content": question}
        self.messages.append({"role": "assistant", "content": answer})
        
        return answer

bot = RAGChatbot("./company_docs")
print(bot.chat("What is our return policy?"))

Conclusion

A functional AI chatbot is genuinely 30 lines of Python. A production-ready one requires conversation management, streaming, rate limiting, persistent storage, and a proper web interface — but each component is approachable.

Start with the minimal version, add features as you need them, and resist the temptation to over-engineer from day one. Most chatbots in production started exactly this way.

For building more complex AI applications, see our LangChain tutorial. For deploying to production with proper infrastructure, see our deploy AI model guide.


Frequently Asked Questions

What do I need to build an AI chatbot with Python?

Python 3.9+, an API key (OpenAI, Anthropic, or Google), and the respective Python SDK. The core is ~30 lines: maintain a messages list, append user/assistant turns, send the list to the API each time. Advanced features (streaming, RAG, web interface) build on this foundation.

How do I add memory to my AI chatbot?

Memory is the messages list — the entire conversation history sent with each API call. Short-term: append messages to a list. Long-term: store in a database and load on session start. When conversations exceed the context window, either trim oldest messages or summarize them.

How do I stream chatbot responses in real-time?

Use stream=True in the API call. Iterate over the returned stream, printing each chunk as it arrives. For web apps, use Server-Sent Events or WebSockets via FastAPI's StreamingResponse. This makes responses feel much faster since text appears immediately.

How do I add knowledge from my own documents?

RAG (Retrieval-Augmented Generation): embed your documents, store in a vector database, retrieve relevant chunks when users ask questions, include chunks in the prompt. LangChain's ConversationalRetrievalChain handles this automatically. See our RAG guide for full implementation.

How do I deploy my Python chatbot to production?

Streamlit Community Cloud is easiest for prototypes. Railway or Render for simple PaaS deployment. For production: FastAPI backend, Redis for sessions, proper database for history, Docker for containerization, environment variables for API keys.

Share this article:

Frequently Asked Questions

To build a basic AI chatbot with Python you need: Python 3.9+, an OpenAI API key (or Anthropic/Google for Claude/Gemini), the openai Python package, and optionally Flask or FastAPI for a web interface. The core is simple — maintain a messages list, append user messages and model responses, send the full list to the API on each turn. More advanced features (persistent memory, RAG, streaming) add on top of this foundation. The entire minimal working chatbot is about 30 lines of Python.
A

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

Related Articles

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources
Join Free Channel

No spam. Leave anytime.

!