Acceleration pt 2 – Minimizing Data Sets

posted in: FileCatalyst | 0

Accelerating File Transfer: Quicklinks

Overview | Part 1: Compression | Part 2: Minimizing Data Sets | Part 3: Multiple Streams | Part 4: Alternatives (like FileCatalyst!) to FTP/TCP

We began discussing the various ways that acceleration solutions can shorten transfer time. The first of these was the use of compression.

The main point of the article was to show how smaller data sets will (naturally!) mean less transfer time. Compression is a fairly basic way to make smaller data sets, and it is not always 100% effective, particularly when the data being sent is not compressible. But there are other ways a transfer solution can cut down on the amount of data being sent.

It boils down to: “Don’t send redundant data”:

  • Don’t send redundant files: Seems obvious enough, and plenty of solutions out there (especially in the backup solution sphere) have this in place. Picture a typical copy+paste operation in your Operating System. If a duplicate file exists, you are asked, “A file with this name exists. Overwrite?” And then you can peek at the time and date, plus file size, allowing you to decide. By automating this process, the file transfer solution will just skip past the files that it doesn’t need to resend.
  • Don’t send redundant bytes: This one’s a little trickier! Imagine you have a multi-GB file of some sort (say, a monolithic database or an ISO). You’ve made some small changes to it and it’s time to send it off. You sigh in despair, knowing that even though only 4MB has changed, you’ll have to transfer that behemoth all over again. Wouldn’t it be nice if only those changes could be sent? FileCatalyst offers a file Deltas function, which does just that. As the transfer initializes, the source and destination files are compared… if there’s a difference, FileCatalyst will identify the blocks that have changed and build a “patch” (the delta file). This patch gets applied to the destination file, making it identical to the source file.
    The benefit is obvious: a 4MB delta (for example) takes a lot less time to transfer than the whole 5GB file!

As with most features, there needs to be a realistic expectation before deciding that Deltas will enhance the efficiency of any given task. For example, adding too many source locations going to the same destination creates the potential for conflict. One source to many destinations works brilliantly, however! Also, if you have a task to transfer primarily small files, you’re often better off just sending the files. And, like compression, be aware that building deltas will use a portion of your machine’s CPU. If possible, compare the performance of a given task with or without deltas… it’ll soon be clear which one to choose for a particular job.

Deltas and compression are not mutually exclusive! Assuming a source file that’s compressible to begin with, the two will work quite well together, allowing you to send the smallest possible amount of data.

Accelerating transfers can broadly be broken down into two categories: minimizing data, and optimizing the link. We’ve now covered the first category by discussion compression and Deltas. Our next articles will address optimizing the link itself, via Multiple Streams, as well as ditching TCP (FTP) in favour of something better suited to acceleration.