Generative AI¶

Models that create new content — text, images, audio, video, and code — rather than only classifying it.

Generative AI is one of the core areas in the AI University map of AI. Explore the diagram, then dive into each topic — every subtopic grows into its own deep-dive over time.

flowchart TB
  FM([Foundation Model]) --> TXT[Text]
  FM --> IMGG[Image]
  FM --> AUD[Audio]
  FM --> VID[Video]
  FM --> CODE[Code]
  FM --> MM[Multimodal]

Key topics¶

Foundation models

Large models pre-trained on broad data and adapted to many downstream tasks.
Text generation

LLMs writing, summarizing, translating, and reasoning.
Image, audio & video generation

Diffusion and transformer models for creative and design work.
Code generation

Models that write and edit software from natural language.
Multimodal generation

Systems that mix modalities — text-to-image, image-to-text, any-to-any.

The families of generative models¶

"Generative AI" is an umbrella over several distinct model families, each with different strengths:

Family	How it generates	Shines at
Autoregressive (LLMs)	One token at a time	Text, code, anything sequential
Diffusion	Denoise pure noise into a sample	Images, audio, video
GANs	Generator vs discriminator arms race	Fast, sharp image synthesis
VAEs	Encode to a latent space, decode	Smooth latent spaces, representation learning

Modern image and video tools are mostly diffusion; chat and code tools are autoregressive transformers. Increasingly they combine — e.g. a transformer conditioning a diffusion model.

Diffusion in plain terms¶

A diffusion model is trained by taking real images, progressively adding noise until they're pure static, and learning to reverse that process. To generate, it starts from random noise and denoises step by step into a coherent image — optionally conditioned on a text prompt.

flowchart LR
  NOISE[/Random noise/] --> D1[Denoise] --> D2[Denoise] --> D3[Denoise] --> IMG[/Image/]
  PROMPT[/"a red bicycle"/] -. guides .-> D1
  PROMPT -. guides .-> D2
  PROMPT -. guides .-> D3

"Guidance" is the knob that controls how strictly the output follows the prompt versus staying natural — a small idea that does a lot of the heavy lifting in text-to-image tools.

NLP & Large Language Models · Computer Vision · Speech & Audio AI · Building with AI

Learn this properly

Want hands-on training in generative ai? Explore AI University courses and AI School camps for kids.

Generative AI¶

Key topics¶

The families of generative models¶

Diffusion in plain terms¶

Related areas¶