How to Achieve Fast Uploads to Cloud Storage

posted in: Blogs | 0

We see a lot of confusion among IT professionals about the pros/cons of cloud-based object storage like Amazon S3 and Microsoft Azure Blob. There tends to be a general lack of awareness regarding the smaller players, like Backblaze B2 and Wasabi for instance, and their benefits. The most confusion, however, revolves around migrating data to these various cloud storage solutions, and how it can be done as fast as possible.

After reading this blog, you should get a good sense of whether cloud storage is right for you, and have a good idea of how you can move data into object storage. Additionally, you’ll see why FileCatalyst is one of the best options available for accelerating the upload/migration of data into cloud storage, regardless of the cloud storage vendor.

What this blog covers:

  1. A brief overview of the advantages of public cloud-based object storage
  2. The costs associated with cloud storage and transaction/egress fees
  3. How you can migrate data to cloud storage
  4. How FileCatalyst can accelerate cloud data migrations

Let’s get started!

 

The Problems with On-premise Storage

First, without going into a ton of details, here are some of the main disadvantages of keeping your data in-house/on-premise:

  • High capital cost – you need to buy the storage hardware, maintain it, update it
  • Physical space – you need racks and a place to put them
  • Power (and cooling) costs – this is not cheap when you start to scale
  • IT team to support it – if you have an IT team great, if not…
  • Obsolete quickly (due to data growth) – you can exhaust capacity quickly
  • Data loss (backup solutions cost even more) – what’s your backup strategy?

 

Advantages of Cloud Storage

  • Ease of access – convenient web interface, many tools to move data in/out
  • Lower cost of entry – you only pay for what you use
  • Scales automatically – never need to add more storage, unlimited capacity
  • Operational expense – pay only for what you use, easier to budget
  • No data loss – highly redundant, no need to worry about drive failures

Now that we have that out of the way, let’s see how much it actually costs to use public cloud storage.

 

How Much Does it Cost to Store Your Data in Cloud Storage?

Usually, there are three types of fees for cloud-based object storage:  

Storage fees – This is how much you pay to store your data per GB per month.  Think of it as a physical storage locker you rent at a local storage facility; you pay monthly for the size of the locker you rent. In the case of cloud storage, you pay for the volume of data you store in Gigabytes (GB).  However, unlike a traditional physical storage locker, you only pay for what you use. And also unlike a traditional storage locker, there is no physical capacity. You can add as much data as you need, and it scales automatically to accommodate.

Egress fees – Imagine if a physical storage company charged you to take your own stuff out of your storage locker. That is what most cloud storage providers are doing, at least the larger ones. You store your data there, and you are paying to keep it there every month, then you pay to move it outside of their network. They don’t charge you to move it within their network, i.e., access it with SaaS applications running in the same cloud. Moving data into cloud storage is always free. Why? Because they want your data, you pay monthly to keep it there. But they want you to keep it there, and so they usually charge you per GB to get it out.

Transaction fees – Most people tend to think in terms of the storage cost itself, maybe even the egress fees. But another hidden fee that many don’t consider are transaction fees. These fees apply to operations like PUT, GET, COPY, POST, or LIST requests. Some operations are made by the client software you use to access and present your data in a nice user interface, and these can really add up.

 

Summary of Costs

Below is a table outlining the high-level costs. I included the main players like S3, Azure Blob and Google Storage, as well as some of the smaller less expensive options like Wasabi and Backblaze.

 

Summary of costs for various cloud storage providers.
Egress (per GB) General Storage (Per GB per Month) Archive Storage (Per GB per Month) API/Transaction fees Full pricing details
Amazon S3 $0.09 -$0.05 Starts at $0.023 $0.004 Yes https://aws.amazon.com/s3/pricing/
Azure Blob $0.087 – $0.05 Starts at $0.0184 $0.002 Yes https://azure.microsoft.com/en-us/pricing/details/storage/blobs/

https://azure.microsoft.com/en-us/pricing/details/bandwidth/

Google Cloud Storage $0.12 – $$0.08 $0.026 $0.007 Yes https://cloud.google.com/storage/pricing
Alibaba Object Storage Service $0.076 – $0.06 $0.0185 $0.0036 Yes https://www.alibabacloud.com/product/oss#pricing
Backblaze B2 $0.01 $0.005 Not supported Yes https://www.backblaze.com/b2/cloud-storage-pricing.html
Wasabi $0.00 $.0049 Not supported No https://wasabi.com/pricing/

* Current as of May 2018

As you can see, there is quite a range in price, and making the right choice really comes down to your specific use case. Do you need specific cloud or SaaS-based apps to interact directly with your data? Are you just looking for a place to archive your heaps of data? Are you looking for long-term cold storage? Do you need to move data out of your storage on a regular basis?

The larger three providers, particularly Amazon AWS, can take advantage of a phenomenon known as Data Gravity to warrant higher fees. This phenomenon allows them to charge more for the privilege of having your data close to their compute platform, a vast marketplace ecosystem with extensive feature sets. Not surprisingly, Backblaze and Wasabi have the lowest costs and its because they are “no-frills” storage providers. They have no marketplace, no compute platform and no cold storage. But what they do, they do very well.

With regards to Egress fees, this allows the cloud provider to make their storage even more sticky.  If your applications and data all live in the same place, and it costs a small fortune to get your data out, you are more likely to stay. They try very hard to make the business case to “switch” to another provider be negated by the substantial costs to move the data.

Amazon and Microsoft, being #1 and #2 in market share, understandably have higher egress fees than the storage only providers, while the smaller providers Backblaze ($0.01 per GB) Wasabi (no egress fees) charge either very little or nothing for egress.  

So why would you pick the big guys instead of the less expensive smaller guys? Well, it really depends on your use case.  As of 2018, AWS is the leader in cloud computing by a large margin. Most software vendors, if they don’t already, will have a SaaS solution hosted in AWS. It makes sense to put your data where it can be accessed by your AWS based SaaS applications very efficiently, i.e., in Amazon S3. You also get the piece of mind knowing that many companies want, of dealing with the leader in the industry, you have no fear that they will disappear.

On the other hand, if your sole motivation is backup, and you have a tight budget (and who doesn’t these days), why not use a vendor where you don’t pay as much to store or retrieve your data. If you calculate the volume of data you need to store, you will see MASSIVE savings if you use Wasabi or Backblaze B2 rather than Amazon S3 or Azure Blob.

 

Moving Data into Cloud Storage

Amazon S3, Azure Blob, and Google Cloud Storage all have a proprietary REST API that can be leveraged by 3rd parties to build file transfer tools. They also have command line tools that work very well. There are also many 3rd-party tools available that have implemented the REST API to make the data migration experience as comfortable as possible. Many of the other storage providers provide an S3 compatible interface, so applications that work with Amazon S3 will automatically work with their storage.

Examples of the types of available tools include web interfaces, command line tools, mapped drives, traditional 2-pane file transfer applications, and backup/sync solutions.  

Below is a list of some popular tools that can be used to connect to your cloud storage. I’m not going to evaluate each one (you are on a FileCatalyst site after all) but each has their own merits that are great for different use cases.

 

Tools for transferring data to various cloud storage providers.
Amazon S3 Aws-cli, s3cmd, Cloudberry, FileCatalyst, TNT Drive, Cyberduck
Azure Blob AzCopy, Cloudberry, FileCatalyst, Cyberduck
Google Cloud Storage Cloudberry, Cyberduck, FileCatalyst
Alibaba Object Storage Service CLI
Backblaze B2 CLI, Cyberduck, FileCatalyst
Wasabi Basically any Amazon S3 compatible transfer tool (see above) and of course FileCatalyst

 

Accelerating Transfers Into Object Storage

Although there are many tools out there that let you easily migrate data to cloud storage, or to perform backup/restore/sync tasks, when you need to migrate a massive amount of data, you should consider trying FIleCatalyst if you want to migrate files or data to cloud storage fast.

Although FileCatalyst costs more than these other applications, you may not want to spend days, weeks or even months moving data to your cloud storage when it could be done in a much shorter time, assuming you have high enough bandwidth.  

So, what does FileCatalyst do differently than the other applications? The other applications talk directly to S3 using HTTP, which can slow down considerably when the network has latency (more on why that happens here). Because of this, as you get further away from the geographic location where the object storage physically resides, the speeds may be reduced considerably.  

To an extent, you can overcome this because many of the tools mentioned above use parallel uploads and multipart uploads to send a lot of data concurrently. Even with this, however, if you are dealing with higher speed networks (i.e., Gbps+), or distant geographic locations, you still do not get optimal performance. Check out this AzCopy comparison we did (Microsoft’s Azure Blob upload tool). FileCatalyst was more efficient for larger files at a higher bandwidth, while AzCopy was just as efficient at lower speeds, or when the transfer involved many smaller files.

FileCatalyst works by using a proprietary UDP based protocol to maximize speeds over the most distant part of the network (between the source of the data and the target cloud storage region) that has high latency. A FileCatalyst Server instance runs in a compute platform close to the object storage. The following diagram shows how it is done with AWS/EC2/S3.

The FileCatalyst server, in this case, is hosted inside EC2 in the same region as the S3 storage.  The same AWS SDK that all other S3 tools use is still used by FileCatalyst Server, except the HTTP communication (the part that slows down over long distances) which is done over a link with very low network latency. Then a FileCatalyst client is used at the remote location, and the transfers over the higher latency network are performed using UDP. Files move from the remote location to the FIleCatalyst Server in EC2 and are streamed by that FileCatalyst Server directly to S3, never landing on storage in EC2. Nothing fancy is done to files, and they can be accessed through all standard tools, including the S3 web interface.

 

Summing it all Up

Obviously, a lot of information on cloud storage was covered here, but some key takeaways are:

  1. You have lots of options for cloud storage, all with pros/cons.  Which one you go with really depends on your budget and use case.
  2. There are lots of ways to move data into whatever cloud storage you select, again it depends on budget and use case.
  3. If you need to move bulk data into object storage at high speed, there is no faster solution than FileCatalyst Direct.

You can try FileCatalyst for free to see how it performs, and if you choose to move forward we license it in several ways. Visit our Trial page for more information.