GuideThe AGI Scientist · June 12, 2026 · 9 min read

How to run a reproducible experiment

A practical checklist for experiments others can actually re-run — the difference between a result and a rumor.

A result nobody can reproduce is a rumor with a chart. This guide is the checklist we hold our own work to before we publish.

Pin everything

Environment. Lock dependency versions and record the hardware. "Latest" is not a version.
Data. Snapshot the exact dataset and its preprocessing. Reference it by a content hash, not a filename.
Seeds. Set and log every random seed. Report variance across seeds, not a single lucky run.

Make it one command

If reproducing your work takes more than a single command, most people won't. Ship a script that pulls the pinned data, runs the experiment, and emits the same numbers you're claiming.

Report honestly

State what didn't work, the failure modes, and the compute budget. A reproducible negative result is worth more than an unreproducible triumph.

Then publish it open

Put the code, the config, and the artifacts where the community can reach them — and hand the next researcher a higher starting point.