What to Know Before Moving Your Data to the Cloud

“Big data” and “cloud storage” are some of 2013’s biggest tech buzzwords, and with good reason. Data is growing exponentially, and we need a place to store it. Cloud-based infrastructure-as-a-service (IaaS) providers like Amazon Web Services and Microsoft Azure provide scalable, cost effective ways to store and access your data. However, you must first get your data into the cloud before you can leverage these benefits. This introduces a new problem… How can you efficiently transfer huge volumes of data into the cloud?

Amazon’s solution to the problem is for you to ship your storage device to them and they will copy the data for you. Click here for more details. However, for a variety of reasons you may wish to move your data youself, especially if you have the available bandwidth.

Ideally, you could move your data to the cloud using a software solution over your existing internet connection, thus leveraging the bandwidth you are already paying for. If you have attempted this but your tools are transferring the data slowly, you need to determine where the problem lies.

There are a couple of bottlenecks to overcome here including network latency between your location and the cloud providers physical location, and the bandwidth between your cloud hosted VM and your cloud storage. If you transfer files directly to Amazon S3 for example, you are limited by the Amazon multipart upload facility that they provide as part of their API. This includes 3rd party tools that make use of that API. The problem with this approach is that it is still hindered by the network conditions since it uses TCP.

Getting files directly to and from cloud storage like S3 is something that cannot directly be overcome until cloud vendors provide direct integration with an acceleration software like FileCatalyst. Let’s focus on the other bottleneck, which is getting files into a position where there is less latency (EC2 for example), and then copying files to the cloud storage (such as S3). In this scenario, the TCP based mechanisms for moving data to S3 are not nearly as inefficient as transferring over the internet.

Using EC2 as an example, you could install FileCatalyst Direct server on your Windows or Linux VM. You would then create user accounts which map directly to your S3 (this also works with Amazon Glacier). Now you would have an accelerated means to move files into the Amazon cloud. However to do this, you need to make use of some additional tools to mount your storage.

In order to mount your S3 storage (or Glacier), there are various options. Here are some examples of tools that are available for both Windows and Linux:

Windows: Linux

This site also has a lot of great insight into mounting S3 to Linux http://www.turnkeylinux.org/blog/exploring-s3-based-filesystems-s3fs-and-s3backer

With the S3 storage mounted directly from your VM running in the same cloud, you have minimized the network latency between them. This means that the tools mentioned above should provide sufficient performance to keep up with high speed file transfers. When combined with an accelerated file transfer solution like FileCatalyst, this will provide the best way to optimize your bandwidth and efficiently move your data to the cloud.