Graduate Course

Generative AI and Sampling with Markov Processes

A mathematically grounded introduction to modern generative AI and sampling through Markov chains.

Instructor: Omar Chehab
Level: Master/PhD
Focus: Theory + Algorithms

Course Description

Generative AI and Sampling are two fields that have recently seen a surge of interest, as they power many modern AI applications—from video generation to mathematical reasoning.

Despite their similar goals, these fields are typically associated with different academic communities. Generative AI is mostly developed within the mainstream machine learning community, while Sampling is more commonly studied in statistics, Monte Carlo methods, Bayesian inference, chemistry, and functional analysis.

This course aims to provide a unified treatment of these two perspectives.

At a high level, both fields seek to generate samples from a target data distribution, but they assume different forms of access to that distribution:

  • Generative AI: we are given a finite dataset sampled from the distribution.
  • Sampling: we are given an unnormalized density function.

Over time, a significant amount of jargon has emerged in both areas.

  • Generative AI: diffusion models, flow models, score-based models, discrete diffusion models, edit flows.
  • Sampling: annealed MCMC, sequential Monte Carlo (SMC), parallel tempering (replica exchange), simulated tempering, simulated annealing, Wasserstein and Fisher-Rao Gradient Flows.

This course will cut through this terminology and present these methods within a simple unifying framework:

  1. Define a Markov chain that converges to the target data distribution.
  2. Learn its generator when necessary.

Throughout the course we will focus on principled questions related to the computational and sample efficiency of these methods, and highlight several open problems in the field.

For practitioners, the goal is to provide a clear conceptual framework that summarizes recent advances. For theorists, the aim is to enable meaningful comparisons between these methods and help identify promising directions for future research.

Syllabus

Week Topic
1Reviewing different types of access to the data distribution
2Langevin Dynamics and its time-reversal
3Markov Chains Carlo estimation and MCMC fundamentals
8Diagnostics, mixing, and sample quality evaluation
9Computational and statistical tradeoffs in modern samplers

Material