What I write about

Saturday, 19 July 2025

A deep technical breakdown of how ChatGPT works

How ChatGPT Works – A Deep Technical Dive

🌟 INTRODUCTION: The Magic Behind the Curtain

Have you ever asked ChatGPT something — like “Summarize this news article” or “Explain AI like I’m 10” — and wondered how this is even possible? Let’s walk through how ChatGPT actually works.


🧠 PART 1: ChatGPT Is a Probability Machine

ChatGPT doesn’t understand language like humans. It generates text by predicting what comes next — one token at a time.

Example:

You type: “The Eiffel Tower is in”

  • Paris → 85%
  • France → 10%
  • Europe → 4%
  • a movie → 1%

The highest-probability token wins — so it outputs “Paris.” This continues token by token. This is called auto-regressive generation.


🔡 PART 2: What’s a Token?

Tokens are chunks of text — not full words or characters.

  • “ChatGPT is amazing” → ["Chat", "GPT", " is", " amazing"]

GPT processes and generates text one token at a time within a fixed context window.

  • GPT-3.5 → ~4,096 tokens
  • GPT-4 → ~8k–32k tokens

🧰 PART 3: What Powers It Underneath

ChatGPT is built on a Transformer — a deep neural network architecture introduced in 2017.

1. Embeddings

Tokens are converted into high-dimensional vectors that capture meaning. Similar words end up close together in vector space.

2. Self-Attention

Self-attention lets the model decide which previous tokens matter most for the current prediction.

“The cat that chased the mouse was fast” → “was” refers to “cat”

3. Feed-Forward Layers

These layers refine meaning after attention using non-linear transformations.

4. Residuals + Layer Normalization

These stabilize training and allow very deep networks to work reliably.


⚙️ PART 4: How It Was Trained
  1. Pre-training — learns language by predicting the next token
  2. Supervised Fine-Tuning — trained on human-written examples
  3. RLHF — optimized using human feedback and PPO

⚠️ PART 5: Where It Goes Wrong
  • Hallucinations
  • Stale knowledge
  • Context window limits
  • Bias inherited from data

🎓 CONCLUSION: It’s Just Math — But Really Good Math

ChatGPT is a probability engine trained on massive data and refined by human feedback. It doesn’t think — but it predicts extremely well.

No comments:

Post a Comment

A deep technical breakdown of how ChatGPT works

How ChatGPT Works – A Deep Technical Dive 🌟 INTRODUCTION: The Magic Behind the Curtain Have you ever asked Cha...