Original article was published by Qblocks on Artificial Intelligence on Medium
Fugaku demonstrated more than 2.8 times the performance of the previous list leader Summit (ORNL), benchmarked at 148.6 Petaflops (and now in second place).
More explained by Jack Dongarra (one of the pioneers of Supercomputing domain):
But what about the cost of building such a fast supercomputer?
The cost to build Fugaku was about $1 Billion (source), on par with what is projected for the U.S. Exascale machines.
Yes, you read that correctly. A billion dollars worth of money it took to build the fastest supercomputer in the world.
There was a significant R&D and DC upgrade involved in building this supercomputer. Had they used off-the-shelf CPUs then it would have costed 3 times more than this.
To recover the cost of building such a supercomputer, the users are charged exorbitant amounts of money.
This has been the trend from the beginning of computing era.
Fastest computers have always costed a bomb.
Thus, they have always been limited to only a particular group of researchers, companies and universities that can shell out this kind of money on computing.
Because of this, a lot of research takes a halt at individual and small team level. They get restricted to limited computing infrastructure which slows down the innovation at all levels.
There needs to be a change in this domain to support research at this scale too.
Not just with supercomputing but even renting a powerful GPU instance on cloud for your next Machine learning or deep learning algorithm takes a backseat when you see a huge bill coming in your inbox that puts a dent on your wallet.
For example, a Tesla V100 GPU instance on AWS costs $3.06/hr and if you had to train your NLP model for 2 weeks continuously then it’d have costed you $1000 for the entire duration. And it doesn’t stop there. You have to retrain the models a lot of times to perform hyperparameter optimization or improve accuracy or simply do transfer learning to train the model for an entirely new dataset.
Running a GPU instance adds up the costs incrementally and soon enough becomes quite expensive.
At Q Blocks, we realized that there is a better way. A method used by some of the biggest scientific organizations in the world like CERN itself.
This is called as distributed computing. In simple terms, it means workloads are distributed and computed across multiple computers all over the world.
Projects like SETI, LHC and folding@home have leveraged this technology to power trillions and trillions of calculations.
To do this at scale has its own challenges so we started with the most simple and basic form. A peer to peer computing platform: where you can rent out a powerful machine from a remote compute provider instead of using traditional cloud like AWS.
This leads to 10 times cost savings and our platform is live now with a powerful set of GPUs to train your AI workloads such as deep learning model training in computer vision and natural language processing space.
If you are working on AI research then you’d understand the pain associated with computing costs to train an effectively larger model.
So, checkout Q Blocks peer to peer computing platform to get multi-GPU instances at upto 1/10th the cost for your AI workloads.
We hope there was a great learning that came from this article for you about how supercomputing works and how supercomputing when democratized can benefit the entire industry and human race.
Gaurav Vij (Co-founder & CTO, Q Blocks)