Skip to content

Generative AI

Models that create new content — text, images, audio, video, and code — rather than only classifying it.

Generative AI is one of the core areas in the AI University map of AI. Explore the diagram, then dive into each topic — every subtopic grows into its own deep-dive over time.

flowchart TB
  FM([Foundation Model]) --> TXT[Text]
  FM --> IMGG[Image]
  FM --> AUD[Audio]
  FM --> VID[Video]
  FM --> CODE[Code]
  FM --> MM[Multimodal]

Key topics

  • Foundation models


    Large models pre-trained on broad data and adapted to many downstream tasks.

  • Text generation


    LLMs writing, summarizing, translating, and reasoning.

  • Image, audio & video generation


    Diffusion and transformer models for creative and design work.

  • Code generation


    Models that write and edit software from natural language.

  • Multimodal generation


    Systems that mix modalities — text-to-image, image-to-text, any-to-any.

The families of generative models

"Generative AI" is an umbrella over several distinct model families, each with different strengths:

Family How it generates Shines at
Autoregressive (LLMs) One token at a time Text, code, anything sequential
Diffusion Denoise pure noise into a sample Images, audio, video
GANs Generator vs discriminator arms race Fast, sharp image synthesis
VAEs Encode to a latent space, decode Smooth latent spaces, representation learning

Modern image and video tools are mostly diffusion; chat and code tools are autoregressive transformers. Increasingly they combine — e.g. a transformer conditioning a diffusion model.

Diffusion in plain terms

A diffusion model is trained by taking real images, progressively adding noise until they're pure static, and learning to reverse that process. To generate, it starts from random noise and denoises step by step into a coherent image — optionally conditioned on a text prompt.

flowchart LR
  NOISE[/Random noise/] --> D1[Denoise] --> D2[Denoise] --> D3[Denoise] --> IMG[/Image/]
  PROMPT[/"a red bicycle"/] -. guides .-> D1
  PROMPT -. guides .-> D2
  PROMPT -. guides .-> D3

"Guidance" is the knob that controls how strictly the output follows the prompt versus staying natural — a small idea that does a lot of the heavy lifting in text-to-image tools.

NLP & Large Language Models · Computer Vision · Speech & Audio AI · Building with AI


Learn this properly

Want hands-on training in generative ai? Explore AI University courses and AI School camps for kids.