In this new column, we identify several computing industry trends and then examine the implications of each for the CM industry. In future columns, we'll look at disk performance, networking, CPU speeds and multi-core systems, and more. Today, we start off with disk space trends and implications.
We all know that disks are growing in capacity and dropping in cost. Unless you've been off-planet, you've seen the terabyte drives for $100 advertized at the local store, or perhaps, like me, even purchased a few yourself. But what does such capacity mean to version control? Does it change the validity of long-held historical assumptions?
From the early days of SCCS and RCS, disk space has been a valuable and scarce resource. While the concept of version control seemed nice, the extra space it used could be problematic. Hence, a key feature of version control for over 30 years has been delta compression - storing only the differences between revisions to save space. CPU time and significant complexity was routinely traded off to save space, for both text files (as in SCCS & RCS) and binaries (RCS, xdelta and others). Claims that 200 versions of a typical source code file could be stored in double the space of a single version using delta compression were common. When access performance was important, a file revision caching layer was added so that disk space savings would not be impacted, but further adding to the solution complexity. In short, disk space savings was valued more highly than CPU time or product complexity.
But does this make sense any more? Is disk space really more valuable or scarce than CPU time? Are complex solutions justified, or might there be a simpler or better solution?
Delta compression takes significant CPU time, more so at check in and less so at check out. In our case, both of these occur on the SnapshotCM server. Additionally, the check out performance is asymmetrical - some versions check out faster than others - because of the way deltas are stored. The effect of complexity on performance became apparent when doing performance testing on the SnapshotCM proxy server a few years ago. Testing showed that check outs from the proxy server were significantly faster than from the repository server. We were surprised initially. However, investigation revealed that the revision cache of the proxy is much simpler than that in the repository server, and perhaps most importantly, does not contain any delta compression, only whole revisions. This simpler design and resulting lower overhead resulted in a doubling of the number of revisions that could be checked out per second.
Does this mean SnapshotCM should stop doing delta compression? What are the alternatives and what are their strengths and weaknesses? And what impact would such a change have on other parts of SnapshotCM? We will continue looking next month. Stay tuned!