NLP & Large Language Models¶
Getting machines to understand and generate human language — now dominated by large language models.
NLP & Large Language Models is one of the core areas in the AI University map of AI. Explore the diagram, then dive into each topic — every subtopic grows into its own deep-dive over time.
flowchart LR
T[/Text/] --> TOK[Tokenize] --> EMB[Embed] --> TR{{Transformer}} --> DEC[Decode] --> O[/Output/]
RAG[(Your documents)] -. retrieve .-> TR
Key topics¶
-
Tokenization & embeddings
Splitting text into tokens and mapping them to vectors that capture meaning.
-
Language models
Models that predict text; scaling them produced the emergent capabilities of LLMs.
-
Prompting
Steering an LLM with instructions, examples (few-shot), and chain-of-thought reasoning.
-
Retrieval-augmented generation (RAG)
Ground an LLM in your own documents by retrieving relevant context at query time.
-
Fine-tuning & alignment
Adapting base models with supervised fine-tuning and RLHF/DPO to be helpful and safe.
-
Context, tokens & cost
Context windows, token limits, latency, and how pricing works in practice.
-
Evaluation
Benchmarks, LLM-as-judge, and measuring hallucination, factuality, and task success.
How an LLM works, end to end¶
A large language model does one deceptively simple thing: predict the next token. Everything else emerges from doing that extremely well at scale.
flowchart LR
T[/"Your prompt"/] --> TOK[Tokenize] --> EMB[Embed to vectors]
EMB --> TR{{Transformer layers<br/>attention + MLP}}
TR --> LOGITS[Scores over vocabulary]
LOGITS --> SAMPLE[Sample next token]
SAMPLE --> APP[Append + repeat]
APP --> TR
The model never "looks things up." It has compressed patterns from its training data into its weights, and generates text one token at a time by repeatedly predicting the most plausible continuation. This is why LLMs can be fluent yet confidently wrong — fluency and factuality are different things.
Prompting, RAG, or fine-tuning?¶
Three ways to make an LLM do what you want, from cheapest to most involved:
| Approach | What it changes | Best for |
|---|---|---|
| Prompting | Only the input | Fast iteration, general tasks, formatting |
| RAG | Adds retrieved context | Grounding in your documents / fresh facts |
| Fine-tuning | The model weights | Consistent style, narrow skills, latency/cost |
Default order
Start with prompting. Add RAG when the model needs facts it wasn't trained on. Reach for fine-tuning only once you have evals proving prompting and RAG aren't enough — it's the most expensive to build and maintain.
Retrieval-augmented generation (RAG)¶
RAG grounds an LLM in a knowledge source it didn't memorize, dramatically cutting hallucination on domain questions.
flowchart LR
Q[/User question/] --> EMB[Embed]
EMB --> SEARCH[(Vector search)]
DOCS[(Your documents)] -. indexed .-> SEARCH
SEARCH --> CTX[Top-k relevant chunks]
CTX --> LLM{{LLM: answer using this context}}
Q --> LLM
LLM --> A[/Grounded answer + citations/]
The quality of a RAG system is usually decided by the boring parts — how you chunk documents and how good your retrieval is — far more than by which LLM you use.
Related areas¶
Deep Learning · Generative AI · AI Agents & Autonomy · Building with AI
Learn this properly
Want hands-on training in nlp & large language models? Explore AI University courses and AI School camps for kids.