Skip to main content

Why Open Source Matters for Reproducible Science

I develop almost everything in open source. People sometimes ask why I spend so much time on documentation, examples, and polish for free software.

The answer is simple: science should be reproducible, and code is increasingly central to scientific claims.

The Reproducibility Crisis

Academia has a problem. Published papers often cannot be reproduced. The reasons are mundane:

  • Methods described too vaguely
  • Data not available
  • Code never released
  • Dependencies undocumented
  • Computational environment not preserved

This is not just inefficient. It undermines the scientific method. A result you cannot reproduce is not a result. It is an anecdote.

Code as Scientific Artifact

When your research involves computation (and whose doesn’t these days?), your code is part of your methodology. Hiding it is like a biologist refusing to describe their experimental protocol.

Open source is not charity. It is scientific rigor.

Why I Document Obsessively

Every library I publish includes:

  • Clear installation instructions
  • Reproducible examples
  • API documentation
  • Tests that demonstrate usage
  • Version-controlled history showing evolution

This takes time. But it means someone in 2028 can:

  • Understand what I did
  • Reproduce my results
  • Build on my work
  • Find my errors

That last point matters. I want people to find my errors. That is how science works.

The Broader Point

Open source accelerates science by enabling replication, facilitating collaboration, preventing redundant work, and building cumulative knowledge. None of this works if the code stays on your laptop.

I will keep publishing everything. Not for recognition, but because science is a collective enterprise that only works if we show our work.

Discussion