My git-iversary

20 August, 2017

This week marks one year since I started using git in my research projects. It has been a major shift, but I’ve come to love the workflow, and see practical benefits for pretty much every kind of writing.

Version history done right

People must exist who never keep track of the history of their work. They start a file and just keep editing it, and the current version is always the best one. They never have false starts, or hit dead ends, or try it two different ways to see which one they like best, or bring back something they took out (or never took anything out to begin with).

For everyone else, these problems require some kind of history. We do this when we add a date or number into the filename. Software development requires a more rigorous way to keep track of different versions of plaintext computer code, which led to the development of version control programs like git. But git can be used on any text, not just code, and a clear history is just as relevant for writing anything really - books, articles, poetry, music playlists, trashy fanfics, and even data itself.

Back to plaintext

When I started using a computer in the late 1980’s, early word processors were really amazing if they could add fancy stuff to your writing. Bold, italics, colors, underlining, images, and fancy fonts all made it seem the work was ‘better’ than before (and added more bloat). But really, this hid a basic truth - our work is the text itself, not the presentation of that text.¹ Git encourages us to remember this, since it works best with plaintext files like txt, tex, and markdown, rather than over-formatted binary documents like docx and xlsx.

Thinking on a tree

Keeping a version history is easy with many programs nowadays. Emailing files to myself was the simplest way to guarantee that I didn’t lose it. We have “Track Changes” in Word. OSX has Time Machine. Dropbox keeps backup copies of each file, at least for a little while, and I’ve used this feature many times to rescue a file I had damaged or deleted accidentally. But these see the history as a single, linear path, and the older versions are eventually meant to be thrown away.

Git is different, because the sequence of changes is an integral unit (the “repository”), and nothing can ever be taken out of it. There’s a certain piece of mind when writing, especially when I remove files or sections of text, because I know I can always revert and never lose anything. Like a tree, the repository also allows multiple branches, descended from a common ancestor. Branching makes sense in software, and in collaborations, but it is also a general idea even for single authors of prose.

Imagine you are writing a novel, and have to decide whether to keep a character in the story or take them out. It is not an easy change, and you must go slowly through the entire book to make all the edits to remove them. But you also can’t decide if this is a good move, and you want to keep working on the version of the story with the character in it. This is a branch situation - the author can use git to create two branches, like parallel universes, and continue writing both to see how each feels, without them getting mixed up with each other. A linear history can’t do this.

Keeping databases under git version control

Many behavioral projects work with “small” data, e.g. a few hundred or thousand observations in a study. The file sizes here are not that big, if they are stored as plaintext formats like CSV. In this case, it makes sense to version the data as well. This doesn’t work if they are in a proprietary, “binary” format like Microsoft Excel or Access.

I’ve recently moved the entire database for the Tsimane Health and Life History Project into a plaintext CSV format, and put it under version control on Github. So far it is going well, and our data team can keep track of multiple, simultaneous changes to the data without much confusion.

Keeping yourself committed

Beyond the above, there’s something special about the add-commit-push cycle. A commit is forever, an indelible step forwards. And, you get to pick which changes go into a commit, and narrate those changes in the commit message. This is great for giving credit - but also blame - for particular ideas in a project. If scientists begin using such kinds of version control, it might change the way we think about the cumulative cultural evolution of ideas, for the better.

Anyway, that’s my musings for tonight.

My thinking on this has been heavily influenced by Kieran Healy’s Plain Person’s Guide to Plain Text Social Science, which should be read by everyone. ↩︎