A Codeville user speaks

Fraser Speirs has an informative post about Git which concludes with:

I don’t hear anything about arch, monotone, BitKeeper, codeville, SVK or darcs from anywhere except the nerdiest of SCM nerds.

Git has also been getting attention from Michael Tsai, John Gruber, and Digital Web.

I’m not a SCM nerd. I’ve never used Git, mostly because I’ve never had the time to check it out. But I am a Codeville user, and here are my thoughts about it.

My experience with Codeville began when I started to work with Ross Cohen, who implemented most of Codeville (Bram came up with the merge algorithm). We switched to Codeville for the research project that eventually became Mosuki in mid-2004, because of a desperate need for a VCS that was easy to branch and merge. Codeville also was (and may still be) the version control used for BitTorrent starting in 2005.

In 2004, having the main Codeville developer on hand at all times mitigated the otherwise insane decision to switch to what was then an alpha version control system. We encountered our fair share of bugs, usually caused by a non-SCM nerd (often me) doing something silly or weird that the SCM nerds hadn’t considered. Codeville has matured a lot. Since mid 2006, it has basically been smooth sailing. (Ross finally bumped the version number up to 0.8.)

Basic concepts

Speirs starts out with Git’s basic architecture:

From an architectural perspective, Git is gloriously simple. There are four essential objects: blobs, trees, commits and tags:

Codeville has three basic concepts that the user must understand: a repository or branch, a change-set, and a file or directory. A repository is simply a set of change-sets (actually, it is a DAG of change-sets, but you don’t need to understand that to use Codeville). A change set is basically a diff between two states of the repository, plus some extra information like which files were added, deleted, or renamed. And, if you’ve read this far, you already understand what a file and a directory is.

If I understand Speirs’ explanation of Git correctly, Git’s “blob” corresponds to files, and Codeville’s “changeset” encapsulates both “commit” and “tag.” I’m not sure how Git’s “tree” object is used. However, Codeville’s basic concepts are as simple as Git’s, if not simpler.

Handling unfinished changes

Speirs then points out what he sees as another benefit of Git: the ability to modify a change after it has been made:

Once nice feature of Git is that it allows you to undo or change a commit after it has been made. Here’s one example of where it’s super useful: I work between a desktop and a laptop machine. Using subversion, when I have to move machines, I commit my work in progress and then update the machine I’m moving onto. This is generally fine, but it means there are a lot of commits in the repository that represent points at which I wouldn’t normally commit code – where things are broken, incomplete or don’t compile. With Git and some care, you can commit your work in progress, pull the changes to another machine and then undo the last commit.

Migrating changes between machines is certainly common — I do it all the time. I don’t see the need for a version control system to deal with this case, however. I deal with this case in Codeville in the same way I dealt with it when using CVS, or Subversion. If I’m just moving unfinished changes from repository A on machine 1 to repository A on machine 2, and there are no uncommitted changes on machine 2, a one-line rsync or scp to synchronize the directories beats anything provided by any version control system, no matter how easy.

And if I’m moving a subset of changes, if the target repository is different, or if machine 2 has other uncommitted changes, I just export the changes into unified diff format with cdv diff > file.diff, copy the file.diff to the target machine, and apply the changes with patch -p0.

I don’t see how handling this synchronization inside the version control tool is any better; especially since the changes you are transporting are, by definition, unfinished. You wouldn’t want to distract your work-mates with commit messages about unfinished changes, and you certainly wouldn’t want them pulling down your unfinished changes to screw up their branch. Why go through the version control system at all in this case?

Postponing commits

Speirs also notes another important feature in Git, the ability to commit only some files and leave other changes for later:

how often have you done some work on a feature and cleaned up some headers as you went by?

Codeville also allows this. When you’re ready to commit, you’re presented with a list of changed files; remove any files you do not want to commit from this list and they will be skipped. I don’t know how easily other tools handle this case; I would not be able to tolerate any version control tool that didn’t allow this.

Branching and Merging

Branching and merging in in Codeville is stupid easy. We merge branches to get bug fixes daily, and the merge algorithm is good enough that conflicts are vanishingly rare, and false conflicts are pratically nonexistent. And in three years, we’ve never seen a bad merge.

In fact, branching and merging is so easy and reliable that it presented a work-flow problem; when we were new to Codeville at Mosuki, we went a little branch-happy, and each programmer ended up with their own personal development branch, each with a different new feature; and each branch had bug fixes and improvements that the other two branches needed. Of course, even this was easy to deal with; we just merged our personal branches into the main development branch and worked from there.

OSX bundles

Both Git and Codeville avoid CVS and Subversion’s mistake of requiring a CVS or .svn directory inside every directory — Codeville just requires a single .cdv directory at the root of each repository, making life simple for people who are working with OSX bundles. Both Speirs and Tsai point out this shortcoming; a design decision that was shortsighted in CVS and just plain unacceptable in Subversion.

One place where Git fails for Speirs is in the merging of NIB files. I’m not sure why this is; NIB files (i believe) are actually OSX bundles, and they (at least the ones I looked at) contain an Objective C file, an XML file, and a binary file. I don’t know why Git, or Subversion, would be having problems with that; the Codeville repository for BitTorrent contained a number of NIB bundles for the OSX version, and we never had a problem with them.

Miscellaneous

Tsai notes that the Git download is compact (1.1 MB). Codeville requires Python to be installed, but other than that, it works out of the box on OSX, Windows, and *nix, and the download comes in at a flimsy 91 kB.

Conclusion

As far as I’m concerned, there’s nothing missing in Codeville. The developers will tell you the biggest missing feature is explicit binary support; but this just means that merging two different versions of binary files will use Codeville’s text-oriented merge and (probably) mangle the file. But you know something? In three years of working with Codeville on several different projects with many branches, nobody has ever changed the same binary file in two different branches and then tried to merge them. It’s a missing feature, but by no means a showstopper.

So, there is something about Codeville from a non-SCM nerd. I doubt there’s any non-SCM nerd out there with enough experience with many different version control and source-code management tools to give a really objective comparative analysis of all the options. It’s clear to me that both Codeville and Git are improvements on Subversion, as Subversion was clearly an improvement on CVS.