Generative AI¶
Models that create new content — text, images, audio, video, and code — rather than only classifying it.
Generative AI is one of the core areas in the AI University map of AI. Explore the diagram, then dive into each topic — every subtopic grows into its own deep-dive over time.
flowchart TB
FM([Foundation Model]) --> TXT[Text]
FM --> IMGG[Image]
FM --> AUD[Audio]
FM --> VID[Video]
FM --> CODE[Code]
FM --> MM[Multimodal]
Key topics¶
-
Foundation models
Large models pre-trained on broad data and adapted to many downstream tasks.
-
Text generation
LLMs writing, summarizing, translating, and reasoning.
-
Image, audio & video generation
Diffusion and transformer models for creative and design work.
-
Code generation
Models that write and edit software from natural language.
-
Multimodal generation
Systems that mix modalities — text-to-image, image-to-text, any-to-any.
The families of generative models¶
"Generative AI" is an umbrella over several distinct model families, each with different strengths:
| Family | How it generates | Shines at |
|---|---|---|
| Autoregressive (LLMs) | One token at a time | Text, code, anything sequential |
| Diffusion | Denoise pure noise into a sample | Images, audio, video |
| GANs | Generator vs discriminator arms race | Fast, sharp image synthesis |
| VAEs | Encode to a latent space, decode | Smooth latent spaces, representation learning |
Modern image and video tools are mostly diffusion; chat and code tools are autoregressive transformers. Increasingly they combine — e.g. a transformer conditioning a diffusion model.
Diffusion in plain terms¶
A diffusion model is trained by taking real images, progressively adding noise until they're pure static, and learning to reverse that process. To generate, it starts from random noise and denoises step by step into a coherent image — optionally conditioned on a text prompt.
flowchart LR
NOISE[/Random noise/] --> D1[Denoise] --> D2[Denoise] --> D3[Denoise] --> IMG[/Image/]
PROMPT[/"a red bicycle"/] -. guides .-> D1
PROMPT -. guides .-> D2
PROMPT -. guides .-> D3
"Guidance" is the knob that controls how strictly the output follows the prompt versus staying natural — a small idea that does a lot of the heavy lifting in text-to-image tools.
Related areas¶
NLP & Large Language Models · Computer Vision · Speech & Audio AI · Building with AI
Learn this properly
Want hands-on training in generative ai? Explore AI University courses and AI School camps for kids.