AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

🧠

AI Learning

LLM Core Concepts Explained

Transformers, attention, embeddings, tokens, context windows — all explained in plain English.

#llm #ai #machine-learning #transformers

Back to Notes Library

LLM Core Concepts Explained

What is an LLM?

A Large Language Model (LLM) is an AI trained on vast amounts of text to predict the next token in a sequence. Models like GPT-4, Claude, Gemini, and LLaMA are LLMs.

Key Concepts

Tokens

Text is broken into tokens (words, sub-words, or characters)
1 token ≈ 0.75 words in English
"Hello world" = 2 tokens
Token limits define how much text a model can process at once

text

"The quick brown fox" → ["The", " quick", " brown", " fox"] = 4 tokens

Context Window

The context window is the maximum number of tokens a model can "see" at once — including your prompt AND the response.

Model	Context Window
GPT-3.5	16,385 tokens
GPT-4o	128,000 tokens
Claude 3.5 Sonnet	200,000 tokens
Gemini 1.5 Pro	1,000,000 tokens

Transformer Architecture

LLMs are built on the Transformer architecture (introduced in 2017).

Key Components

Embeddings — Convert tokens into numerical vectors
Attention Mechanism — Lets the model focus on relevant parts of input
Feed-Forward Layers — Process and transform information
Self-Attention — Each token "attends" to all other tokens

Self-Attention Formula

text

Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) x V

Q = Query matrix
K = Key matrix
V = Value matrix
d_k = dimension of key vectors

Key Parameters

Temperature

Controls randomness of outputs.

Value	Effect
0.0	Deterministic, always same answer
0.5	Balanced creativity
1.0	Creative, more varied
2.0	Very random, often incoherent

Top-P (Nucleus Sampling)

Limits token selection to top % of probability mass.

top_p = 0.9 means consider tokens that make up 90% of probability

Max Tokens

Maximum number of tokens in the response.

Embeddings

Embeddings are numerical representations of text in high-dimensional vector space. Similar concepts are close together.

python

# Example: using OpenAI embeddings
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
vector = response.data[0].embedding  # 1536-dimensional vector

Training Stages

1. Pre-training

Train on massive text corpus (internet, books, code)
Learn language patterns, facts, reasoning
Very expensive: millions of dollars

2. Fine-tuning

Train on specific domain data
Adjust model for particular tasks
Much cheaper than pre-training

3. RLHF (Reinforcement Learning from Human Feedback)

Human raters score model outputs
Model learns to prefer human-rated good answers
Makes models more helpful, honest, harmless

Hallucination

When an LLM confidently states false information, it's called hallucination.

Causes:

Training data had errors
Model doesn't "know" what it doesn't know
Extrapolates beyond actual knowledge

Mitigation:

Use RAG (Retrieval Augmented Generation)
Ask model to cite sources
Use lower temperature for factual tasks
Verify outputs with tools/search

RAG (Retrieval Augmented Generation)

text

User Query
    ↓
[Vector Database Search]
    ↓
Relevant Documents Retrieved
    ↓
LLM receives: Query + Retrieved Context
    ↓
Accurate, grounded response

Popular LLM APIs

Provider	Models	Best For
OpenAI	GPT-4o, o1	General purpose, coding
Anthropic	Claude 3.5, Claude 4	Long context, analysis
Google	Gemini 1.5, 2.0	Multimodal, long context
Meta	LLaMA 3.1, 3.3	Open source, self-hosting
Mistral	Mistral Large, Codestral	European, code

Read Online

📱

Get more notes like this daily on Telegram!

Free study notes, cheat sheets & AI tips

Last reviewed on June 13, 2026 by the AiTechWorlds Notes Team. Free cheat sheet — no signup required.

Go deeper on this topic

ArticleDeep Learning Explained: Neural Networks from Zero to Understanding ArticleBuilding Your First Deep Learning Model with PyTorch: Practical Guide BookLLM Complete Guide 2026 BookGenerative AI Deep Dive InterviewMachine Learning & AI ArticleBest Machine Learning Courses in 2025: Ranked After Taking Them All

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

🧠

AI Learning

LLM Core Concepts Explained

Transformers, attention, embeddings, tokens, context windows — all explained in plain English.

#llm #ai #machine-learning #transformers

Back to Notes Library

LLM Core Concepts Explained

What is an LLM?

A Large Language Model (LLM) is an AI trained on vast amounts of text to predict the next token in a sequence. Models like GPT-4, Claude, Gemini, and LLaMA are LLMs.

Key Concepts

Tokens

Text is broken into tokens (words, sub-words, or characters)
1 token ≈ 0.75 words in English
"Hello world" = 2 tokens
Token limits define how much text a model can process at once

text

"The quick brown fox" → ["The", " quick", " brown", " fox"] = 4 tokens

Context Window

The context window is the maximum number of tokens a model can "see" at once — including your prompt AND the response.

Model	Context Window
GPT-3.5	16,385 tokens
GPT-4o	128,000 tokens
Claude 3.5 Sonnet	200,000 tokens
Gemini 1.5 Pro	1,000,000 tokens

Transformer Architecture

LLMs are built on the Transformer architecture (introduced in 2017).

Key Components

Embeddings — Convert tokens into numerical vectors
Attention Mechanism — Lets the model focus on relevant parts of input
Feed-Forward Layers — Process and transform information
Self-Attention — Each token "attends" to all other tokens

Self-Attention Formula

text

Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) x V

Q = Query matrix
K = Key matrix
V = Value matrix
d_k = dimension of key vectors

Key Parameters

Temperature

Controls randomness of outputs.

Value	Effect
0.0	Deterministic, always same answer
0.5	Balanced creativity
1.0	Creative, more varied
2.0	Very random, often incoherent

Top-P (Nucleus Sampling)

Limits token selection to top % of probability mass.

top_p = 0.9 means consider tokens that make up 90% of probability

Max Tokens

Maximum number of tokens in the response.

Embeddings

Embeddings are numerical representations of text in high-dimensional vector space. Similar concepts are close together.

python

# Example: using OpenAI embeddings
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
vector = response.data[0].embedding  # 1536-dimensional vector

Training Stages

1. Pre-training

Train on massive text corpus (internet, books, code)
Learn language patterns, facts, reasoning
Very expensive: millions of dollars

2. Fine-tuning

Train on specific domain data
Adjust model for particular tasks
Much cheaper than pre-training

3. RLHF (Reinforcement Learning from Human Feedback)

Human raters score model outputs
Model learns to prefer human-rated good answers
Makes models more helpful, honest, harmless

Hallucination

When an LLM confidently states false information, it's called hallucination.

Causes:

Training data had errors
Model doesn't "know" what it doesn't know
Extrapolates beyond actual knowledge

Mitigation:

Use RAG (Retrieval Augmented Generation)
Ask model to cite sources
Use lower temperature for factual tasks
Verify outputs with tools/search

RAG (Retrieval Augmented Generation)

text

User Query
    ↓
[Vector Database Search]
    ↓
Relevant Documents Retrieved
    ↓
LLM receives: Query + Retrieved Context
    ↓
Accurate, grounded response

Popular LLM APIs

Provider	Models	Best For
OpenAI	GPT-4o, o1	General purpose, coding
Anthropic	Claude 3.5, Claude 4	Long context, analysis
Google	Gemini 1.5, 2.0	Multimodal, long context
Meta	LLaMA 3.1, 3.3	Open source, self-hosting
Mistral	Mistral Large, Codestral	European, code

Read Online

📱

Get more notes like this daily on Telegram!

Free study notes, cheat sheets & AI tips

Last reviewed on June 13, 2026 by the AiTechWorlds Notes Team. Free cheat sheet — no signup required.

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.