My research is in machine learning, with the following goals:

  • learning densities from data
  • generating samples from an unnormalized distribution and/or data
  • learning representations and causal structure from data

I am particularly interested in the computational and sample efficiency of algorithms that achieve these goals.

The data I use comes in different modalities, such as synthetic data, image data and molecular data. In particular, non-invasive brain recordings (M/EEG) were central to my PhD work and remain a data modality I work on. Here are keywords that best describe my research areas:

  • Energy-Based Models
  • Diffusion and Flow Models
  • Sampling Algorithms
  • Score Matching
  • Density Ratio Estimation
  • Causal Discovery
  • Representation Learning
  • Brain Imaging
  • Multi-View Independent Component Analysis

Service. I review submissions to the following machine learning conferences: NeurIPS, ICML, ICLR and AISTATS. I also occasionally review submissions to these journals of statistics or machine learning: JMLR, TMRL, AISM, JUQ, and Mathematical Methods of Statistics. I am grateful to have been recognized as a "top reviewer" (AISTATS 2022, NeurIPS 2022-23-24).

Generating Samples

Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps

Sampling from multi-modal distributions with polynomial query complexity in fixed dimension via reverse diffusion

Time-reversed diffusions are state-of-the-art for sampling multi-modal distributions, but they rely on score estimates. We analyze how estimation errors affect the final samples.

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

Annealed MCMC tries to approximate a prescribed path of distributions. We show that the popular geometric mean path with a Gaussian has unfavorable geometry. Presented at the Yale workshop on sampling.

A Practical Diffusion Path for Sampling

Time-reversed diffusions are state-of-the-art in sampling but rely on score estimates. We aim to reduce their variance.

Learning densities

Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences

Conditional Noise-Contrastive Estimation of Energy-Based Models by Jumping Between Modes

We explore the design choices of a method called CNCE for learning energy-based models.

Density Ratio Estimation with Conditional Probability Paths

A density ratio can be obtained by integrating the time score of a probability path. We present an efficient way to estimate the time score.

Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond

Annealed Importance Sampling uses a prescribed path of distributions to compute an estimate of a normalizing constant. We quantify how the choice of path impacts the estimation error.

The Optimal Noise in Noise-Contrastive Learning Is Not What You Think

NCE estimates the data density by minimizing a binary classification loss, between data and noise samples. We find the optimal noise distribution that minimizes the estimation error.

Learning Representations and Causal Structure

Multi-View Causal Discovery without Non-Gaussianity: Identifiability and Algorithms

We propose three algorithms with theoretical guarantees for learning causal relationships (Directed Acyclic Graph) between random variables, given correlated measurements. We apply our algorithms to brain recordings (MEG and fMRI).

MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations

Independent Component Analysis (ICA) is a popular algorithm for learning a representation of data. We propose a version that handles data collected from different contexts, and whose representations differ only by temporal delays or dilations.

Deep Recurrent Encoder: an end-to-end network to model magnetoencephalography at scale

We compare different models for predicting the brain’s response to external stimuli. Our model, based on a deep neural network, is more accurate and interpretable.

Learning with self-supervision on EEG data

We learn rich representations of EEG brain activity using a self-supervised loss.

Uncovering the structure of clinical EEG signals with self-supervised learning

We learn rich representations of EEG brain activity using a self-supervised loss.

A mean-field approach to the dynamics of networks of complex neurons, from nonlinear Integrate-and-Fire to Hodgkin–Huxley models

Our theory predicts the average behavior of neuronal populations that fire asynchronously.

Miscellaneous

Nano World Models: A Minimalist Implementation of Future Video Prediction

A minimalist codebase for future video prediction centered around diffusion forcing.