Follow AiTechWorlds on LinkedIn for professional AI content!Follow Now →

A

AiTechWorlds

!

🧠 Artificial Intelligence · Report #1

How Large Language Models Actually "Think": Tokens, Embeddings & Transformers

June 18, 2026 9 min read

Abstract

Large Language Models (LLMs) such as ChatGPT and Claude do not reason like humans — they predict the most probable next token from patterns learned across billions of texts. This research breaks down tokens, embeddings, attention and context windows, and explains the practical implications for anyone building with or relying on AI.

Chapters

Download full research (PDF) Watch on YouTube RSS

Key Findings

✓ An LLM does not "think" — it predicts the most probable next token based on patterns learned from data.
✓ Tokens are fragments of words; embeddings turn those fragments into vectors that capture meaning.
✓ The attention mechanism decides which tokens are relevant to which — it is the heart of the Transformer.
✓ The context window is how many tokens a model can consider at once; beyond it, earlier text is forgotten.
✓ Hallucinations occur because the model optimizes for plausible output, not for verified truth.

Overview

Large Language Models (LLMs) have become the most widely used form of artificial intelligence, yet most people use them without understanding the mechanism underneath. This research explains, from first principles, what actually happens when you type a prompt into ChatGPT, Claude, or Gemini — and why that mechanism produces both remarkable results and confident mistakes.

What an LLM really does

At its core, an LLM is a next-token predictor. Given a sequence of text, it outputs a probability distribution over the next token and samples from it. It repeats this one token at a time. There is no internal "belief", "intent", or "understanding" in the human sense — only an extraordinarily rich statistical model of how language tends to continue. This single fact explains most of an LLM's strengths and weaknesses.

Tokens: the atoms of language models

Text is not processed as words or characters but as tokens — sub-word fragments produced by a tokenizer. The word "unbelievable" might split into "un", "believ", and "able". Tokenization matters because pricing, context limits, and even model behavior are measured in tokens, not words. Roughly, one English word is about 1.3 tokens.

Embeddings: turning tokens into meaning

Each token is mapped to a high-dimensional vector called an embedding. These vectors are learned so that tokens with similar meaning or usage sit close together in vector space. This is why a model can treat "king" and "queen", or "Paris" and "France", as related: their embeddings encode that relationship numerically. Embeddings are also the foundation of semantic search and Retrieval-Augmented Generation (RAG).

Attention and the Transformer

The breakthrough that made modern LLMs possible is the attention mechanism, introduced in the 2017 paper "Attention Is All You Need". Attention lets the model weigh how relevant every other token is to the token it is currently processing. Stacking many attention layers produces the Transformer architecture. This is what allows a model to connect a pronoun at the end of a paragraph with the noun it refers to near the beginning.

Context windows and their limits

The context window is the maximum number of tokens the model can attend to at once — its working memory. Everything outside the window is invisible to the model. A larger context window lets the model consider more of your document or conversation, but it does not give the model permanent memory: once the conversation grows beyond the window, the earliest content is dropped.

Why hallucinations happen

Because the model optimizes for plausible continuations rather than true ones, it can produce fluent, confident, and entirely fabricated answers — known as hallucinations. Understanding this reframes how you should use AI: as a powerful drafting and reasoning aid whose factual claims must be verified, not as an oracle.

Practical implications

If you internalize that an LLM is a pattern-completion engine, you immediately become better at using it: provide rich context (the model only knows what is in the window), ask for structured output, give examples (few-shot), and verify factual claims. These techniques are not tricks — they are direct consequences of how the architecture works.

Conclusion

LLMs are not thinking machines; they are the most sophisticated text-prediction systems ever built. That distinction is not a criticism — it is the key to using them effectively and safely.

References

Explore further

Prompt Engineering Guide LLM Core Concepts Note AI Learning Courses

Related Research

Product-Market Fit: What It Really Is and How to Measure It