Experiment tracker (lite)
When you tune a model you run it many times with different settings. An experiment tracker records a receipt for each run — the config you used and the result you got — so you can compare them honestly instead of relying on memory ("I think the lr=0.01 run was better?").
A receipt is just a record:
{"config": {"lr": 0.01, "epochs": 5}, "accuracy": 0.91}
Run the editor: with three runs logged, picking the best is a one-liner
(max by accuracy) — and it's reproducible, because the winning config
is written down, not remembered.
What a good receipt captures
- Config — the knobs: learning rate, epochs, which features, the data snapshot. (This ties back to reproducibility: a receipt without the config can't be repeated.)
- Metric(s) — accuracy, precision/recall, loss — whatever you're optimizing.
- Result/artifact — where the trained model or its id lives.
Real teams use MLflow or Weights & Biases for this; under the hood it's
the same shape — a list of {config, metric} receipts you sort and
compare. We use a plain list of dicts.
Why a builder cares
Without receipts, model tuning is folklore: nobody can say why the shipped model was chosen or reproduce it. With receipts, "we picked lr=0.01 because it scored 0.91 vs 0.88/0.82" is a fact you can defend and rerun. You'll log receipts and pick the best next.