A taxonomy of experiment runs. See Section “Runs, Sweeps, Groups, Tags & Types”.

A taxonomy of experiment runs. See Section “Runs, Sweeps, Groups, Tags & Types”.

Many ML research projects follow a structure of:

Create dataset → Finetune model → Evaluate on tasks → Make changes → Repeat.

This isn’t every project, but it’s common enough that it seems worthwhile to consolidate best practices for this sort of empirical research. I’m thinking of questions like

Best practices are often evolved independently and spread via word of mouth or collaboration. Existing tools and guides focus on each of these individual questions, but rarely a complete view.

I’d love to see an overview of a baseline workflow, stumbling blocks to be wary of, and useful tricks otherwise discovered only after lots of fumbling around. Here, I’ll start by sharing what I’ve found works for me, but is by no means a perfect workflow (see “Pain point” boxes). I’m hoping to share this with others so that we can trade notes and discover solutions that make everybody’s lives easier.

Quick note: My workflow is designed for the scale of a typical academic research paper, with small (<5) teams and several months of work. For a weekend project you may prefer less overhead; bigger projects may benefit from more heavy-duty tooling.


Demo

To help connect the dots, I’ve prepared a demo of some of the core ideas in this post. I’ll point to relevant parts as we go along.

https://github.com/JunShern/ml_workflow

File structure