How ChatGPT Works – A Deep Technical Dive
Have you ever asked ChatGPT something — like “Summarize this news article” or “Explain AI like I’m 10” — and wondered how this is even possible? Let’s walk through how ChatGPT actually works.
ChatGPT doesn’t understand language like humans. It generates text by predicting what comes next — one token at a time.
Example:
You type: “The Eiffel Tower is in”
- Paris → 85%
- France → 10%
- Europe → 4%
- a movie → 1%
The highest-probability token wins — so it outputs “Paris.” This continues token by token. This is called auto-regressive generation.
Tokens are chunks of text — not full words or characters.
- “ChatGPT is amazing” → ["Chat", "GPT", " is", " amazing"]
GPT processes and generates text one token at a time within a fixed context window.
- GPT-3.5 → ~4,096 tokens
- GPT-4 → ~8k–32k tokens
ChatGPT is built on a Transformer — a deep neural network architecture introduced in 2017.
1. Embeddings
Tokens are converted into high-dimensional vectors that capture meaning. Similar words end up close together in vector space.
2. Self-Attention
Self-attention lets the model decide which previous tokens matter most for the current prediction.
“The cat that chased the mouse was fast” → “was” refers to “cat”
3. Feed-Forward Layers
These layers refine meaning after attention using non-linear transformations.
4. Residuals + Layer Normalization
These stabilize training and allow very deep networks to work reliably.
- Pre-training — learns language by predicting the next token
- Supervised Fine-Tuning — trained on human-written examples
- RLHF — optimized using human feedback and PPO
- Hallucinations
- Stale knowledge
- Context window limits
- Bias inherited from data
ChatGPT is a probability engine trained on massive data and refined by human feedback. It doesn’t think — but it predicts extremely well.
No comments:
Post a Comment