"I figured most of it out on my own. It shows how
intuitive the feature is to use." - a user commenting on SnapshotCM's
New SnapshotCM Release is Dramatically Faster
The newest release of SnapshotCM is dramatically faster and
more scalable than earlier releases. In short, check out is 10x faster and
check in 3x faster, all while using just one eighth the CPU usage per check
out. Read on for details and be sure to read the summary section which contains
recommendations as you plan your server upgrade.
The performance improvements come from two areas: changing
the way file revisions are stored, and changing a network connection
As discussed in a
newsletter article two years ago, the tradeoff between disk space usage and
access performance is changing. When disk space was relatively expensive and in
short supply, it made sense to trade CPU time for space. But disk space has
become incredibly cheap and this tradeoff no longer makes sense. This is
especially true for non-text files which are not efficiently stored by the
typical delta versioning schemes of the past.
Because of delta compression issues with large files, the
SnapshotCM repository server has long stored large files in a gzip'd file per
revision format, and this is the only format the proxy server has ever used.
Compressing individual revisions results in significant space savings compared
to storing files whole. Furthermore, file expansion is off-loaded to the
client, further reducing both the server and network load on every checkout.
And since decompress requires little CPU or memory, this offloading has no
negative client-side effects.
When Faster is Slower
Based on the above analysis and previous testing, we decided
to transition to exclusive use of the gzip'd file per revision format. After
testing the change on our own data, we were shocked to discover that a check
out of 3400 files had actually slowed, going from 270 to 420 seconds! Totally
unexpected, this result caused lots of confusion until we understood another
problem we had been experiencing for some time.
The key was noticing a 200 ms delay in the check out
protocol. We'd assumed this was server processing (we have a 13 year old
server), but server measurements showed that the data was written without
delay. It simply wasn't arriving at the client until 200 ms later. We quickly
confirmed this was occurring with both repository and proxy servers and on
every OS where we tested it, including customer systems.
Eventually we discovered the Nagle algorithm and TCP's
delayed ACK and learned how they can interact with certain patterns of network
writes to introduce a 200 ms delay in network communication. It also gave me an
explanation for why check outs got slower with the "faster" storage method, and
more importantly, what to do about it.
In short, file writes smaller than an Ethernet packet size
(about 1500 bytes) were being delayed 200 ms by TCP/IP in order to aggregate
the write with later writes or data ACKs, if possible. Since the gzip'd files
were typically smaller than the actual files, more revisions than before fell
into the small category and incurred the 200 ms delay on check out.
Once this became clear, we backed up to the previous
release, disabled the Nagle algorithm and redid our performance testing. Check
out times halved from 270s to 135s. Not bad for a relatively isolated change,
but we wanted more. So we repeated the testing with gzip'd revisions and a
disabled Nagle algorithm and throughput quadrupled again. Further investigation
showed that the server CPU load is about 8x higher processing RCS files during
check out than simply reading the gzip'd revision files. The final check out
time dropped from 270 to just 24 seconds for over a 10x improvement in
Storage Space Implications
Because of the dramatic performance improvements, this was
clearly a change we wanted to make. However, we were concerned with expanding
disk space usage. Certainly, converting a 200 revision delta compressed text
file into 200 separate gzip'd revision files would result in an increase in
space. While true, we discovered that many stored files have just one revision.
And for every one of them, the gzip'd format was smaller than the single
revision format. It turns out that for the majority of delta files with just
two or three revisions, the resulting gzip'd revisions were also smaller in
aggregate than the delta file they replaced.
We also didn't want to just compress rarely accessed files,
as that would not improve typical user experienced performance, so we decided
to automatically convert revisions to the gzip'd revision format on check out.
The first check out pays this conversion cost, while all successive check outs
reap the benefit.
What We Implemented
In the final product, we made two key changes:
- We disabled the Nagle algorithm on server writes to the
clients. This eliminated the 200 ms delay for small file check outs.
- We changed the storage system to store all revisions in
gzip'd revision format since it improves both check in and check out
throughput, while reducing server CPU usage. In detail:
- All new revisions are stored in gzip'd revision
- All delta compressed revisions are automatically
converted to gzip'd revision format on check out in order to improve
performance for later check outs of actively used revisions.
- All existing 1, 2 and 3 revision delta files are
auto-converted to the gzip'd revision format. This both saves space and
eliminates the first-access file conversion delay. This automatic conversion
takes place in a background server thread, and may take several days to
complete, depending on the number of files to convert.
- For now, we are keeping delta files with more than 3
In light of these changes, we expect check out throughput to
increase dramatically, server load to decrease, and overall disk space to stay
about the same. However disk space usage is data dependent, therefore we
recommend that you make sure you have some free space on your revision
storage disk (perhaps 20% free) before installing this release. We also
recommend that you monitor the free space especially closely for the first
few weeks after upgrading your repository server to make sure you don't run
out of space.
For a complete list of user-visible changes, see the
List, or contact us.