Ben Letham
Central Applied Science, Meta.


Ben Letham I am in the Central Applied Science group at Meta (formerly known as Facebook). I build AI and optimization tools, develop new ways of analyzing data, and run large online experiments. Prior to this I finished my PhD in operations research at MIT. My Google Scholar profile is here. I can be contacted at bletham@meta.com.
News from the last year or so:

November 2025: I reviewed for ICLR 2026, which means my de-blinded reviews were leaked as part of the OpenReview breach. How fun! If you are dissatisfied with any of my past reviews, please let me know.

October 2025: We have a new paper out on arxiv in which LLMs interact with humans to figure out how to solve poorly specified multi-objective optimization problems (arxiv).

August 2025: We have a paper at KDD on optimizing for long-term A/B test outcomes (pdf).

March 2025: Our ongoing efforts in psychophysics led to this paper on jointly modeling different modalities of data, such as Likert-scale ratings and paired preferences (arxiv).

December 2024: I am a co-author of two papers at NeurIPS (active learning for sensitivity analysis and robust GPs). First time to Vancouver!
Papers are listed in reverse chronological order, by category, with a brief description of each.

Bayesian optimization, active learning, and experimentation


One of the most challenging parts of solving real optimization tasks is translating what the human actually wants into a suitable objective function. We show that LLMs can solve this problem by getting feedback from the human on current results, and adjusting the objective accordingly.

Katarzyna Kobalczyk, Zhiyuan Jerry Lin, Benjamin Letham, Zhuokai Zhao, Maximilian Balandat, and Eytan Bakshy (2025) LILO: Bayesian optimization with interactive natural language feedback. Preprint. (arxiv)

We show how Gaussian process models can jointly fit data from different modalities, such as Likert-scale ratings and paired preferences, by mixing likelihoods. Specific applications include understanding haptic perception and preferences.

Kaiwen Wu, Craig Sanders, Benjamin Letham, and Phillip Guan (2025) Mixed likelihood variational Gaussian processes. Preprint. (arxiv)

We combine short-term and long-term A/B tests to run fast optimizations of long-term outcomes, which we use for improving the efficiency of Meta ranking systems.

Qing Feng, Samuel Daulton, Benjamin Letham, Maximilian Balandat, and Eytan Bakshy (2025) Experimenting, fast and slow: Bayesian optimization of long-term outcomes with online experiments. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, pp. 2235-2246. (pdf)

We develop active learning methods that target functions of derivatives of Gaussian processes, and use them for global sensitivity analysis and understanding parameter importance in tuning problems.

Syrine Belakaria, Benjamin Letham, Jana Doppa, Barbara E. Engelhardt, Stefano Ermon, and Eytan Bakshy (2024) Active learning for derivative-based global sensitivity analysis with Gaussian processes. In: Advances in Neural Information Processing Systems 37, NeurIPS. (pdf)

Standard Gaussian process regression models are ruined by outliers, common in hyperparameter optimization tasks. We developed a robust model that identifies and downweights outliers.

Sebastian Ament, Elizabeth Santorella, David Eriksson, Benjamin Letham, Maximilian Balandat, Eytan Bakshy (2024) Robust Gaussian processes via relevance pursuit. In: Advances in Neural Information Processing Systems 37, NeurIPS. (pdf)

We develop models for preference learning over complex output spaces like images.

Qing Feng, Zhiyuan Jerry Lin, Yujia Zhang, Benjamin Letham, Jelena Markovic-Voronov, Ryan-Rhys Griffiths, Peter I. Frazier, and Eytan Bakshy (2024) Bayesian optimization of high-dimensional outputs with human feedback. In: NeurIPS Workshop on Bayesian Decision-making and Uncertainty. (pdf)

When asking humans their preferences or perception, we can measure reaction times alongside their response, which provides implicit information about how hard the decision is. We develop models that combine reaction times with preferences, and use them for psychophysics and preference optimization.

Michael Shvartsman, Benjamin Letham, Stephen Keeley (2024) Response time improves choice prediction and function estimation for Gaussian process models of perception and preferences. In: Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence, UAI, pp. 3211-3226. (pdf)

Interpretability is of vital importance in real systems. We developed methods for optimizing for sparse solutions, and understsanding the quality/interpretability trade-off, which we use for simplifying the sourcing component of a ranking system.

Sulin Liu, Qing Feng, David Eriksson, Benjamin Letham, Eytan Bakshy (2023) Sparse Bayesian optimization. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 3754-3774. (pdf)

There is a large and old literature in psychology on measuring perceptual thresholds. We can combine parametric models from psychology with nonparametric Gaussian Process models to get the best of both worlds and better understand visual perception.

Stephen Keeley, Benjamin Letham, Craig Sanders, Chase Tymms, Michael Shvartsman (2023) A semi-parametric model for decision making in high-dimensional sensory discrimination tasks. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI, pp. 40-47. (pdf)

Active learning from preference and perception data is made challenging by the Bernoulli responses. We derive new posterior update formulae for the level-set estimation problem that allow better understanding human visual perception.

Benjamin Letham, Phillip Guan, Chase Tymms, Eytan Bakshy, Michael Shvartsman (2022) Look-ahead acquisition functions for Bernoulli level set estimation. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 8493-8513. (pdf) (video)

A thorough study of meta-learning and transfer learning for hyperparameter optimization.

Matthias Feurer, Benjamin Letham, Frank Hutter, Eytan Bakshy (2022) Practical transfer learning for Bayesian optimization. Preprint. (arxiv)

There are subtle reasons why linear embeddings often fail for high-dimensional optimization, which we explore and correct for the purpose of learning robot locomotion policies.

Benjamin Letham, Roberto Calandra, Akshara Rai, Eytan Bakshy (2020) Re-examining linear embeddings for high-dimensional Bayesian optimization. In: Advances in Neural Information Processing Systems 33, NeurIPS. (pdf) (video)

BoTorch, our software and framework for using differentiable programming for Bayesian optimization.

Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, Eytan Bakshy (2020) BoTorch: a framework for efficient Monte-Carlo Bayesian optimization. In: Advances in Neural Information Processing Systems 33, NeurIPS. (pdf)

Tuning contextual policies with A/B tests produces a challenging high-dimensional optimization problem, however the problem has structure that we can take advantage of to successfully tune video adaptive bitrate policies.

Qing Feng, Benjamin Letham, Hongzi Mao, Eytan Bakshy (2020) High-dimensional contextual policy search with unknown context rewards using Bayesian optimization. In: Advances in Neural Information Processing Systems 33, NeurIPS. (pdf)

Ranking systems can be simulated with offline reply, but generally there will be bias. We show that these biased offline estimates can be combined with online A/B tests for multi-task Bayesian optimization. We used this approach to successfully improve the News Feed ranking model.

Benjamin Letham, Eytan Bakshy (2019) Bayesian optimization for policy search via online-offline experimentation. Journal of Machine Learning Research 20(145): 1-30. (pdf) (supplement) (blog post)

Methodological advances for using Bayesian optimization with noisy responses, particularly A/B tests. The paper also describes two systems optimizations that we've done with this approach: tuning a production ranking system, and optimizing web server compiler flags.

Benjamin Letham, Brian Karrer, Guilherme Ottoni, Eytan Bakshy (2019) Constrained Bayesian optimization with noisy experiments. Bayesian Analysis 14(2): 495-519. (pdf) (supplement) (erratum) (blog post)

We meta-learning to accelerate hyperparameter optimization from similar past runs, and use it for hyperparameter optimization in Facebook's computer vision platform.

Matthias Feurer, Benjamin Letham, Eytan Bakshy (2018) Scalable meta-learning for Bayesian optimization. In: ICML AutoML Workshop. (arxiv)

In biology we run wet-lab experiments to identify parameters of dynamical systems. How do we decide if we have run enough experiments, and if not, which additional experiments to run? We use minimax optimization to answer these questions, and use it to understand HIV dynamics.

Benjamin Letham, Portia A. Letham, Cynthia Rudin, Edward P. Browne (2016) Prediction uncertainty and optimal experimental design for learning dynamical systems. Chaos 26: 063110. (pdf) (erratum)

Forecasting


Forecasting is a common data science task, yet also a specialized skill outside the expertise of many data scientists. The Prophet forecasting package is designed to be flexible enough to handle a range of business time series, while still being configurable by non-experts. We developed it for a collection of important forecasting tasks at Facebook, and have since open sourced it. It is built on Stan and has R and Python versions. It has received a tremendous response, filling a real need for forecasting these types of time series.

Sean J. Taylor and Benjamin Letham (2018) Forecasting at scale. The American Statistician 72(1): 37-45. (pdf) (software) (site)

Modeling with sales transaction data


We developed a Bayesian hierarchical model for demand modeling in the presence of stockouts, which we use to infer cookie demand at a local bakery.

Benjamin Letham, Lydia M. Letham, and Cynthia Rudin (2016) Bayesian inference of arrival rate and substitution behavior from sales transaction data with stockouts. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD. (pdf) (video)

We use a copula model to model correlations in demand between various items, and optimally price bundle offers. This work was done during my summer internship at IBM Research.

Benjamin Letham, Wei Sun, and Anshul Sheopuri (2014) Latent variable copula inference for bundle pricing from retail transaction data. In: Proceedings of the 31st International Conference on Machine Learning, ICML, pp. 217-225. (pdf) (errata)

Predictions from rules and information retrieval


We developed a Bayesian approach for fitting interpretable decision list models and used it to create a simple system for predicting stroke risk in atrial fibrillation patients.

Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan (2015) Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Annals of Applied Statistics 9(3): 1350-1371. (pdf)

We use techniques from supervised ranking to learn rule-based models that make predictions on data that are sequentially revealed.

Benjamin Letham, Cynthia Rudin, and David Madigan (2013) Sequential event prediction. Machine Learning 93: 357-380. (pdf)

The ECML PKDD Discovery Challenge 2013 was a competition to develop a recommender system for baby names. I developed a recommender system based on association rules and collaborative filtering. Despite its simplicity and a lack of feature engineering, I won 3rd place for the offline challenge. A major advantage of my approach was that it was fast enough that recommendations could be made entirely online, and I won 2nd place in an online challenge, where my recommender system was actually implemented into the baby naming website.

Benjamin Letham (2013) Similarity-weighted association rules for a name recommender system. In: Proceedings of the ECML PKDD 2013 Discovery Challenge Workshop. (pdf)

An information retrieval algorithm for finding and ranking a list of similar items given a small seed.

Benjamin Letham, Cynthia Rudin, and Katherine Heller (2013) Growing a list. Data Mining and Knowledge Discovery 27: 372-395. (pdf) (supplement) (radio segment)

We developed a learning theory framework for using association rules for classification and sequential event prediction, with both VC bounds and results from algorithmic stability.

Cynthia Rudin, Benjamin Letham, David Madigan (2013) Learning theory analysis for association rules and sequential event prediction. Journal of Machine Learning Research 14: 3385-3436. (pdf)

An earlier conference version of the paper above on theoretical bounds for association rule models in sequential prediction problems.

Cynthia Rudin, Benjamin Letham, Ansaf Salleb-Aouissi, Eugene Kogan and David Madigan (2011) Sequential event prediction with association rules. In: Proceedings of the 24th Annual Conference on Learning Theory, COLT, pp. 615-634. (pdf)

Psychophysics, biology and neuroscience


Understanding how long it takes for visual and auditory stimuli to produce responses in sensory cortices helps to elucidate neural processing pathways. This is follow-up work to the 2010 study below.

Tommi Raij, Fa-Hsuan Lin, Benjamin Letham, Kaisu Lankinen, Tapsya Nayak, Thomas Witzel, Matti Hamalainen, and Jyrki Ahveninen (2024) Onset timing of letter processing in auditory and visual sensory cortices. Frontiers in Integrative Neuroscience 18:1427149. (pdf)

At Meta, psychophysics models are used for setting AR and VR hardware requirements for AR and VR. Nonparametric modeling and active learning are part of the framework for determining the requirements.

Phillip Guan, Eric Penner, Joel Hegland, Benjamin Letham, Douglas Lanman (2023) Perceptual requirements for world-locked rendering in AR and VR. In: SIGGRAPH Asia 2023 Conference Papers, SA. (pdf)

Our open-source toolbox AEPsych enables surrogate model-based active learning for threshold identification and other common psychophysics experimentation tasks.

Lucy Owen, Jonathan Browder, Benjamin Letham, Gideon Stocek, Chase Tymms, Michael Shvartsman (2021) Adaptive nonparametric psychophysics. Preprint. (github, arxiv)

I provided computational and statistical expertise for this study of the dynamics of HIV inhibition by interferon.

Edward P. Browne, Benjamin Letham, Cynthia Rudin (2016) A computational model of inhibition of HIV-1 by interferon-alpha. PLoS ONE 11(3): e0152316. (pdf) (PLOS)

Measuring the timing of brain activations is made challenging by noise. We use robust statistics to better map cross-sensory interactions from MEG data.

Benjamin Letham and Tommi Raij (2011) Statistically robust measurement of evoked response onset latencies. Journal of Neuroscience Methods 194(2): 374-379. (pdf)

While working at The Martinos Center for Biomedical Imaging at Mass. General Hospital I was involved in a study of cross-sensory interactions (auditory stimuli activate visual cortex and vice versa) using MEG and fMRI.

Tommi Raij, Jyrki Ahveninen, Fa-Hsuan Lin, Thomas Witzel, Iiro P. Jaaskelainen, Benjamin Letham, Emily Israeli, Cherif Sahyoun, Christos Vasios, Steven Stufflebeam, Matti Hamalainen (2010) Onset timing of cross-sensory activations and multisensory interactions in auditory and visual sensory cortices. European Journal of Neuroscience 31(10): 1772-1782. (pdf)

Machine learning-themed videos

AISTATS paper reaction | Mega Compilation!!

Paper reaction videos are the latest trend sweeping AI and ML. Efforts are underway to entirely replace the peer review process with the number of likes and subscribes on reaction videos. This video is a compilation of just a few of the reaction videos for a recent AISTATS paper. A lot of excitement! It's gonna be great!

If AI papers were pharmaceuticals (a parody commercial)

Do you suffer from high-dimensional Bayesian optimization? Alebo can help! Watch this commercial to understand how Alebo can make your life happier, your food tastier, and your friends funnier.

Stockouts and substitutions at Lando's Bakery

When KDD required me to make a video describing our paper on stockouts and substitutions at a bakery, I decided it was the perfect opportunity to try making a stop motion video, something I had always wanted to do. It was my first but not my last, and it was every bit as fun as I imagined it would be. Perhaps not quite what the conference organizers had in mind, but I think it got the ideas across pretty well!


Data science posts

Debate bingo: 2020 presidential edition

The second edition of Debate Bingo! I once again extracted commonly used phrases from each candidate to create a bingo card generator, for use during the debates.

In Defense of Fahrenheit

On a scale of 0 to 100, how warm is it? I combine Census data with daily temperatures from nearly 4000 weather stations to estimate the distribution of temperatures experienced by people in the U.S. The oft-maligned Fahrenheit scale nearly perfectly captures this distribution and is a great temperature scale for U.S. weather.

Debate bingo: 2016 presidential edition

Presidential candidates tend to repeat themselves, and our two for 2016 are no exception. I extracted some of their most commonly used phrases and used them to create a bingo card generator, for use during the debates.

The best part of a lightsaber duel is the talking

Why are the original Star Wars lightsaber duels so much better than those from the Prequels? It's not the simple fight choreography, it's the talking. In this post I quantify the balance between talking and fighting in Star Wars lightsaber duels and show that there is a clear difference between the Original Trilogy and the Prequels. Unfortunately, The Force Awakens is a typical Prequels duel.

Was 2015 Boston's worst winter yet?

When I finally had some time after spending the entire month of February shoveling snow, I did an analysis of snowfall data to determine exactly how unusual of a winter it was. Although 2015 beat the all-time total snowfall record by only a couple inches, a deeper look at the data revealed that this was by far the worst winter in recorded history.


Technical expositions

On NP-completeness

The theory of NP-completeness has its roots in a foundational result by Cook, who showed that Boolean satisfiability (SAT) is NP-complete and thus unlikely to admit an efficient solution. In this short paper I prove an analogous result using binary integer programming in the place of SAT. The proof is notationally cleaner and more straightforward to those for whom the language of integer programming is more natural than that of SAT.

Benjamin Letham (2011) An integer programming proof of Cook's theorem of NP-completeness. (pdf)

On Bayesian Analysis

In this work I provide a somewhat rigorous yet simultaneously informal introduction to Bayesian analysis. It covers everything from the theory of posterior asymptotics to practical considerations of MCMC sampling. It assumes the reader is comfortable with basic probability and some mathematical rigor.

Benjamin Letham (2012) An overview of Bayesian analysis. (pdf)

About me

When I'm not working on research, I enjoy sailing, traveling, playing the violin, and reading history.

My Erdös number is 4.