I’ve been looking into projects that are generating large amounts of data or big data in other words, and one project I came across that really peaked my interest was the Square Kilometer Array (SKA) Project. This a long term project (currently phase 5 of 7) is working to answer some of science’s hardest, and mankind's oldest, questions to better understand the origins of the universe and celestial systems, as well as identify potential intelligent life in the universe. The project is building the world’s largest radio telescope, which, when completed, will be made up of over half a million antennas that will span over a square kilometre.
This is one very interesting project in it’s own right, but one thing that really caught my attention is the unfathomable amounts of big data that the project will generate when complete. Once the telescope is operational, it will be collecting data in the ballpark of 11 exabytes per day! This amount of big data really knocked my socks off, to the point that I had to look up just what an exabyte is. Allow me to share that with you.
An exabyte is to a petabyte, what a terabyte is to a gigabyte. So an exabyte is one thousand petabytes, or one million terabytes, or one billion gigabytes, or one quintillion bytes. To picture just how much data that is, 5 exabytes is the equivalent of all words ever spoken by human beings! Professor Peter Quinn of the Centre for Radio Astronomy Research (ICRAR) stated that, "This telescope will generate the same amount of data in a day as the entire planet does in a year. We estimate that there will be more data flowing inside the telescope network than the entire internet in 2020." To think that a telescope will be collecting that much data per day is daunting, and I’m sure that the IT team has lost sleep thinking of how this amount of big data will be moved and stored.
Empathizing with the IT team, I started thinking about of the potential challenges surrounding the movement of data, which are many. First and foremost, file transfer acceleration will be a major asset to this project, since the sheer volume and velocity of the data is so massive. If exabytes have to move via FTP, it may in fact “break the internet.”
There are a number of open source, consumer and enterprise level file transfer acceleration solutions out there, and they all have different features. Some of the features I see as being paramount to a project of this scope this would be compression, incremental transfers, automatic resume and automation.
With compression, an algorithm is applied to the data which stores repetitive bits of information as a “shorthand.” The receiving end can decode the “shorthand” and restore the data back to its original state. This can help make these humongous transfers shorter, and when moving exabytes, any bit helps.
In some scenarios, similar files may exist on both sides of the transfer, but changes may have been made at the source location. Incremental transfer features allow you to only send the changes, and sending a 4 MB delta instead of a multi-exabyte dataset will create an incredible difference in transfer time.
When transferring big data across the globe, connections are unfortunately bound to drop. When moving exabytes, restarting a failed transfer is quite the inconvenience to say the least. Some solutions, including FileCatalyst, have the ability to automatically perform an MD5 checksum and resume the transfer without any data loss. Clearly, this isn’t a feature you want to use often, but when it comes to the rescue it will be a feature to be grateful for.
Automation makes any task easier, and the same can be said for this project. The ability to schedule transfers that synchronize data across endpoints at set intervals can be a major time saver. Imagine you’re a scientist, and you forgot to start your little 11 exabyte transfer of collected data before arriving at the office. Talk about “hurry up and wait.”
The SKA project is certainly reaching for the stars, and it will truly change the face of science by helping answer some of mankind’s hardest questions. But, to truly make this incredible amount of data mobile, accessible, and more valuable, file transfer acceleration will be at the core of making it a possibility.
You may not be moving multiple exabytes or discovering the origins of the universe, but do you have a project that requires you to move large amounts of data? If so, take a look at the FileCatalyst Solution Suite and see how we can help your project reach the stars.