Alex Ratner

Snorkel / UW / Stanford

Github | Google | Twitter
Email: ajratner at

Latest News

One of the key bottlenecks in building machine learning systems today is creating and managing training datasets. Instead of labeling data by hand, I work on enabling users to interact with the modern ML stack by programmatically building and managing training datasets. These weak supervision approaches can lead to applications built in days or weeks, rather than months or years. I’m very fortunate to work with the Snorkel team and members of the Hazy, Info, StatsML, DAWN, and QIAI labs.

Research Projects

Data Programming + Snorkel

Snorkel enables users to quickly and easily label, augment, and structure training datasets by writing programmatic operators rather than labeling and managing data by hand. For more on Snorkel, check out, and our release notes on the new version!


Research Highlights | All Publications

Programmatic Labeling as Weak Supervision

Labeling training data is one of the biggest bottlenecks in machine learning today. My work investigates whether users can train models without any hand-labeled training data, instead writing labeling functions, which programmatically label data using weak supervision strategies like heuristics, knowledge bases, or other models. These labeling functions can have arbitrary accuracies and correlations, leading to new systems, algorithmic, and theoretical challenges. For more here, check out Snorkel.

Multi-Task Weak Supervision

Multi-task learning is an increasingly popular approach for jointly modeling several related tasks. However, multi-task learning models require multiple large, hand-labeled training sets. My work here focuses on using weak supervision instead. We see this enabling a new paradigm where users rapidly label tens to hundreds of tasks in dynamic, noisy ways, and are investigating systems and approaches for supporting this massively multi-task regime. For initial steps, check out Snorkel MeTaL.

Data Augmentation as Weak Supervision

Data augmentation is the increasingly critical practice of expanding small labeled training sets by creating transformed copies of data points in ways that preserve their class labels. Effectively, this is a simple, model-agnostic way for users to inject their knowledge of domain- and task-specific invariances, and my work here investigates how we can support and accelerate this powerful form of weak supervision.


Blog Posts

Some high level thoughts and tutorials; for more blog posts, see paper-specific ones above, and check out

Older News