What I write about

Saturday, 19 July 2025

A deep technical breakdown of how ChatGPT works

How ChatGPT Works – A Deep Technical Dive

🌟 INTRODUCTION: The Magic Behind the Curtain

Have you ever asked ChatGPT something — like “Summarize this news article” or “Explain AI like I’m 10” — and wondered how this is even possible? Let’s walk through how ChatGPT actually works.


🧠 PART 1: ChatGPT Is a Probability Machine

ChatGPT doesn’t understand language like humans. It generates text by predicting what comes next — one token at a time.

Example:

You type: “The Eiffel Tower is in”

  • Paris → 85%
  • France → 10%
  • Europe → 4%
  • a movie → 1%

The highest-probability token wins — so it outputs “Paris.” This continues token by token. This is called auto-regressive generation.


🔡 PART 2: What’s a Token?

Tokens are chunks of text — not full words or characters.

  • “ChatGPT is amazing” → ["Chat", "GPT", " is", " amazing"]

GPT processes and generates text one token at a time within a fixed context window.

  • GPT-3.5 → ~4,096 tokens
  • GPT-4 → ~8k–32k tokens

🧰 PART 3: What Powers It Underneath

ChatGPT is built on a Transformer — a deep neural network architecture introduced in 2017.

1. Embeddings

Tokens are converted into high-dimensional vectors that capture meaning. Similar words end up close together in vector space.

2. Self-Attention

Self-attention lets the model decide which previous tokens matter most for the current prediction.

“The cat that chased the mouse was fast” → “was” refers to “cat”

3. Feed-Forward Layers

These layers refine meaning after attention using non-linear transformations.

4. Residuals + Layer Normalization

These stabilize training and allow very deep networks to work reliably.


⚙️ PART 4: How It Was Trained
  1. Pre-training — learns language by predicting the next token
  2. Supervised Fine-Tuning — trained on human-written examples
  3. RLHF — optimized using human feedback and PPO

⚠️ PART 5: Where It Goes Wrong
  • Hallucinations
  • Stale knowledge
  • Context window limits
  • Bias inherited from data

🎓 CONCLUSION: It’s Just Math — But Really Good Math

ChatGPT is a probability engine trained on massive data and refined by human feedback. It doesn’t think — but it predicts extremely well.

The Reality of Building AI Systems Today

In today’s AI ecosystem, many capabilities that once required deep machine learning expertise have become widely accessible. Powerful API...