How LLMs Process Prompts

To write great prompts, you need to understand how language models actually "think." Once you understand this, many prompting techniques become obvious rather than mysterious.

Tokens — The Building Blocks

LLMs don't read words — they process tokens. A token is roughly 4 characters or ¾ of a word. Understanding tokens explains many LLM behaviors.

"Hello, world!" = 4 tokens: ["Hello", ",", " world", "!"]
"Prompt engineering" = 3 tokens: ["Prompt", " engineer", "ing"]
"ChatGPT" = 2 tokens: ["Chat", "GPT"]

Why tokens matter for you:

Every model has a context window — a limit on total tokens in a conversation (prompt + response). GPT-4o supports 128K tokens. Claude 3.5 Sonnet supports 200K tokens.
When you hit the context limit, the model forgets earlier parts of the conversation.
Pricing for API access is per token — understanding this helps you optimize costs.

Practical rule: For very long tasks, break them into smaller chunks rather than dumping everything in one prompt.

The Attention Mechanism

The core innovation in modern LLMs is the attention mechanism. In simple terms: every token in your prompt "looks at" every other token to understand context and relationships.

This has important practical implications:

Early instructions carry weight — what you put at the beginning of your prompt shapes how the model interprets everything that follows.
Specific keywords "activate" knowledge clusters — saying "You are an expert Python developer" shifts the model into a different mode than "Help me with code."
Contradictions confuse the model — if you say "be concise" at the start but "include full detail" at the end, the model will average them or pick one.

Temperature — Controlling Creativity

Most AI APIs expose a temperature parameter between 0 and 1:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": your_prompt}],
    temperature=0.3,  # 0 = deterministic, 1 = creative
    max_tokens=1000
)

Temperature	Effect	Best For
0.0	Always the same, most predictable	Data extraction, classification
0.3	Reliable with slight variation	Code generation, factual Q&A
0.7	Balanced creativity	General writing, analysis
1.0	Maximum creativity, more varied	Brainstorming, creative writing

In ChatGPT and Claude's web interfaces, you can't set temperature directly — but you can influence it through language. Phrases like "be creative and imaginative" push toward higher temperature behavior. "Be precise and consistent" pushes toward lower temperature behavior.

System Prompts vs User Prompts

Modern LLM APIs have two key message types:

System prompt — sets the context, persona, and rules for the entire conversation. The model treats this as its "operating instructions."

User prompt — the specific request in each turn.

messages = [
    {
        "role": "system",
        "content": "You are an expert data analyst at a Fortune 500 company. Always respond with precise, actionable insights. Format data in tables when possible."
    },
    {
        "role": "user",
        "content": "Analyze this sales data and identify the top 3 growth opportunities..."
    }
]

In ChatGPT's web interface, you can't set a separate system prompt — but you can begin your prompt with "For this conversation, you are [persona]. Your rules are: [rules]..." to achieve the same effect.

Four Rules Based on This Knowledge

Now that you understand how LLMs process prompts, here are four rules that follow directly:

Rule 1: Put the most important instruction first. The beginning of your prompt anchors the model's interpretation of everything that follows.

Rule 2: Be specific, not vague. LLMs "average" across their training data. Vague prompts get averaged, generic outputs. Specific prompts get targeted results.

Rule 3: Show examples when consistency matters. Examples (few-shot prompting — covered in a later lesson) activate specific patterns rather than relying on the model to infer them.

Rule 4: One clear task per prompt. When you pack multiple unrelated tasks into one prompt, the model splits its "attention" and often shortchanges one of them.

Practice Exercise

Open ChatGPT or Claude and test the effect of instruction placement:

Test A: "Summarize this text. Be concise and focus on key takeaways. [paste a 500-word article]"

Test B: "[paste the same 500-word article] Summarize this text. Be concise and focus on key takeaways."

The first version (instruction first) typically produces cleaner, more focused summaries. The second forces the model to process all the content before knowing what to do with it.

Next lesson: The 5 Pillars of Great Prompts — the universal framework that separates amateur prompts from professional ones.