Skip to content

NLP & Large Language Models

Getting machines to understand and generate human language — now dominated by large language models.

NLP & Large Language Models is one of the core areas in the AI University map of AI. Explore the diagram, then dive into each topic — every subtopic grows into its own deep-dive over time.

flowchart LR
  T[/Text/] --> TOK[Tokenize] --> EMB[Embed] --> TR{{Transformer}} --> DEC[Decode] --> O[/Output/]
  RAG[(Your documents)] -. retrieve .-> TR

Key topics

  • Tokenization & embeddings


    Splitting text into tokens and mapping them to vectors that capture meaning.

  • Language models


    Models that predict text; scaling them produced the emergent capabilities of LLMs.

  • Prompting


    Steering an LLM with instructions, examples (few-shot), and chain-of-thought reasoning.

  • Retrieval-augmented generation (RAG)


    Ground an LLM in your own documents by retrieving relevant context at query time.

  • Fine-tuning & alignment


    Adapting base models with supervised fine-tuning and RLHF/DPO to be helpful and safe.

  • Context, tokens & cost


    Context windows, token limits, latency, and how pricing works in practice.

  • Evaluation


    Benchmarks, LLM-as-judge, and measuring hallucination, factuality, and task success.

How an LLM works, end to end

A large language model does one deceptively simple thing: predict the next token. Everything else emerges from doing that extremely well at scale.

flowchart LR
  T[/"Your prompt"/] --> TOK[Tokenize] --> EMB[Embed to vectors]
  EMB --> TR{{Transformer layers<br/>attention + MLP}}
  TR --> LOGITS[Scores over vocabulary]
  LOGITS --> SAMPLE[Sample next token]
  SAMPLE --> APP[Append + repeat]
  APP --> TR

The model never "looks things up." It has compressed patterns from its training data into its weights, and generates text one token at a time by repeatedly predicting the most plausible continuation. This is why LLMs can be fluent yet confidently wrong — fluency and factuality are different things.

Prompting, RAG, or fine-tuning?

Three ways to make an LLM do what you want, from cheapest to most involved:

Approach What it changes Best for
Prompting Only the input Fast iteration, general tasks, formatting
RAG Adds retrieved context Grounding in your documents / fresh facts
Fine-tuning The model weights Consistent style, narrow skills, latency/cost

Default order

Start with prompting. Add RAG when the model needs facts it wasn't trained on. Reach for fine-tuning only once you have evals proving prompting and RAG aren't enough — it's the most expensive to build and maintain.

Retrieval-augmented generation (RAG)

RAG grounds an LLM in a knowledge source it didn't memorize, dramatically cutting hallucination on domain questions.

flowchart LR
  Q[/User question/] --> EMB[Embed]
  EMB --> SEARCH[(Vector search)]
  DOCS[(Your documents)] -. indexed .-> SEARCH
  SEARCH --> CTX[Top-k relevant chunks]
  CTX --> LLM{{LLM: answer using this context}}
  Q --> LLM
  LLM --> A[/Grounded answer + citations/]

The quality of a RAG system is usually decided by the boring parts — how you chunk documents and how good your retrieval is — far more than by which LLM you use.

Deep Learning · Generative AI · AI Agents & Autonomy · Building with AI


Learn this properly

Want hands-on training in nlp & large language models? Explore AI University courses and AI School camps for kids.