Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →
🧠
AI Learning

LLM Core Concepts Explained

Transformers, attention, embeddings, tokens, context windows — all explained in plain English.

Back to Notes Library

LLM Core Concepts Explained

What is an LLM?

A Large Language Model (LLM) is an AI trained on vast amounts of text to predict the next token in a sequence. Models like GPT-4, Claude, Gemini, and LLaMA are LLMs.


Key Concepts

Tokens

  • Text is broken into tokens (words, sub-words, or characters)
  • 1 token ≈ 0.75 words in English
  • "Hello world" = 2 tokens
  • Token limits define how much text a model can process at once
text
"The quick brown fox" → ["The", " quick", " brown", " fox"] = 4 tokens

Context Window

The context window is the maximum number of tokens a model can "see" at once — including your prompt AND the response.

ModelContext Window
GPT-3.516,385 tokens
GPT-4o128,000 tokens
Claude 3.5 Sonnet200,000 tokens
Gemini 1.5 Pro1,000,000 tokens

Transformer Architecture

LLMs are built on the Transformer architecture (introduced in 2017).

Key Components

  • Embeddings — Convert tokens into numerical vectors
  • Attention Mechanism — Lets the model focus on relevant parts of input
  • Feed-Forward Layers — Process and transform information
  • Self-Attention — Each token "attends" to all other tokens

Self-Attention Formula

text
Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) x V

Q = Query matrix
K = Key matrix
V = Value matrix
d_k = dimension of key vectors

Key Parameters

Temperature

Controls randomness of outputs.

ValueEffect
0.0Deterministic, always same answer
0.5Balanced creativity
1.0Creative, more varied
2.0Very random, often incoherent

Top-P (Nucleus Sampling)

Limits token selection to top % of probability mass.

  • top_p = 0.9 means consider tokens that make up 90% of probability

Max Tokens

Maximum number of tokens in the response.


Embeddings

Embeddings are numerical representations of text in high-dimensional vector space. Similar concepts are close together.

python
# Example: using OpenAI embeddings
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
vector = response.data[0].embedding  # 1536-dimensional vector

Training Stages

1. Pre-training

  • Train on massive text corpus (internet, books, code)
  • Learn language patterns, facts, reasoning
  • Very expensive: millions of dollars

2. Fine-tuning

  • Train on specific domain data
  • Adjust model for particular tasks
  • Much cheaper than pre-training

3. RLHF (Reinforcement Learning from Human Feedback)

  • Human raters score model outputs
  • Model learns to prefer human-rated good answers
  • Makes models more helpful, honest, harmless

Hallucination

When an LLM confidently states false information, it's called hallucination.

Causes:

  • Training data had errors
  • Model doesn't "know" what it doesn't know
  • Extrapolates beyond actual knowledge

Mitigation:

  • Use RAG (Retrieval Augmented Generation)
  • Ask model to cite sources
  • Use lower temperature for factual tasks
  • Verify outputs with tools/search

RAG (Retrieval Augmented Generation)

text
User Query
    ↓
[Vector Database Search]
    ↓
Relevant Documents Retrieved
    ↓
LLM receives: Query + Retrieved Context
    ↓
Accurate, grounded response

Popular LLM APIs

ProviderModelsBest For
OpenAIGPT-4o, o1General purpose, coding
AnthropicClaude 3.5, Claude 4Long context, analysis
GoogleGemini 1.5, 2.0Multimodal, long context
MetaLLaMA 3.1, 3.3Open source, self-hosting
MistralMistral Large, CodestralEuropean, code
📱

Get more notes like this daily on Telegram!

Free study notes, cheat sheets & AI tips

Join Free →
10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources
Join Free Channel

No spam. Leave anytime.

!