Graduate Course

Generative AI and Sampling with Markov Processes

A mathematically grounded introduction to modern generative AI and sampling through Markov chains.

Instructor: Omar Chehab

Level: Master/PhD

Focus: Theory + Algorithms

Course Description

Generative AI and Sampling are two fields that have recently seen a surge of interest, as they power many modern AI applications—from video generation to mathematical reasoning.

Despite their similar goals, these fields are typically associated with different academic communities. Generative AI is mostly developed within the mainstream machine learning community, while Sampling is more commonly studied in statistics, Monte Carlo methods, Bayesian inference, chemistry, and functional analysis.

This course aims to provide a unified treatment of these two perspectives.

At a high level, both fields seek to generate samples from a target data distribution, but they assume different forms of access to that distribution:

Generative AI: we are given a finite dataset sampled from the distribution.
Sampling: we are given an unnormalized density function.

Over time, a significant amount of jargon has emerged in both areas.

Generative AI: diffusion models, flow models, score-based models, discrete diffusion models, edit flows.
Sampling: annealed MCMC, sequential Monte Carlo (SMC), parallel tempering (replica exchange), simulated tempering, simulated annealing, Wasserstein and Fisher-Rao Gradient Flows.

This course will cut through this terminology and present these methods within a simple unifying framework:

Define a Markov chain that converges to the target data distribution.
Learn its generator when necessary.

Throughout the course we will focus on principled questions related to the computational and sample efficiency of these methods, and highlight several open problems in the field.

For practitioners, the goal is to provide a clear conceptual framework that summarizes recent advances. For theorists, the aim is to enable meaningful comparisons between these methods and help identify promising directions for future research.

Syllabus

Week	Topic
1	Reviewing different types of access to the data distribution
2	Langevin Dynamics and its time-reversal
3	Markov Chains Carlo estimation and MCMC fundamentals
8	Diagnostics, mixing, and sample quality evaluation
9	Computational and statistical tradeoffs in modern samplers

Material

Slides: released before each lecture.
Recommended readings:
Project: reproduce and analyze a sampling-based generative method, then present results.