Alex Ratner

Snorkel / UW / Stanford

Github | Google | Twitter
Email: ajratner at

Latest News

One of the key bottlenecks in building machine learning systems today is creating and managing training datasets. Instead of labeling data by hand, I work on enabling users to interact with the modern ML stack by programmatically building and managing training datasets. These weak supervision approaches can lead to applications built in days or weeks, rather than months or years. I’m very fortunate to work with the Snorkel team and members of the Hazy, Info, StatsML, DAWN, and QIAI labs.

Research Projects

Data Programming + Snorkel

Snorkel enables users to quickly and easily label, augment, and structure training datasets by writing programmatic operators rather than labeling and managing data by hand. For more on Snorkel, check out, and our release notes on the new version!


Research Highlights | All Publications

Programmatic Labeling as Weak Supervision

Labeling training data is one of the biggest bottlenecks in machine learning today. My work investigates whether users can train models without any hand-labeled training data, instead writing labeling functions, which programmatically label data using weak supervision strategies like heuristics, knowledge bases, or other models. These labeling functions can have arbitrary accuracies and correlations, leading to new systems, algorithmic, and theoretical challenges. For more here, check out Snorkel.

Multi-Task Weak Supervision

Multi-task learning is an increasingly popular approach for jointly modeling several related tasks. However, multi-task learning models require multiple large, hand-labeled training sets. My work here focuses on using weak supervision instead. We see this enabling a new paradigm where users rapidly label tens to hundreds of tasks in dynamic, noisy ways, and are investigating systems and approaches for supporting this massively multi-task regime. For initial steps, check out Snorkel MeTaL.

Data Augmentation as Weak Supervision

Data augmentation is the increasingly critical practice of expanding small labeled training sets by creating transformed copies of data points in ways that preserve their class labels. Effectively, this is a simple, model-agnostic way for users to inject their knowledge of domain- and task-specific invariances, and my work here investigates how we can support and accelerate this powerful form of weak supervision.


  • Accelerating Machine Learning with Training Data Management. Alex Ratner. Stanford PhD Thesis 2019.

  • MLSys: The New Frontier of Machine Learning Systems. Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li , Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Christopher Ré, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar. 2019.

  • See more papers

Blog Posts

Some high level thoughts and tutorials; for more blog posts, see paper-specific ones above, and check out

Older News