Introducing the Cerebras CS-1, the Industry’s Fastest Artificial Intelligence Computer

Source: Deep Learning on Medium

Introducing the Cerebras CS-1, the Industry’s Fastest Artificial Intelligence Computer

Back in August, Cerebras took the industry by storm by announcing the Wafer-Scale Engine (WSE). The WSE is the largest commercial chip ever manufactured, and the industry’s first wafer-scale processor, built from the ground up to solve the problem of deep learning compute. It consists of 1.2 trillion transistors, packed onto a single chip with 400,000 AI-optimized cores, connected by a 100Pbit/s interconnect. The cores are fed by 18 GB of super-fast, on-chip memory, with an unprecedented 9 PB/s of memory bandwidth.

The WSE is but one part of our AI compute solution. Today, I am proud to unveil the Cerebras CS-1, the world’s fastest deep learning compute . Along with the WSE and the Cerebras Software platform, the CS-1 is a comprehensive high performance AI compute solution, one of a kind in the industry.

When we conceptualized the Wafer-Scale Engine, we knew that a once-in-a-lifetime chip like the WSE deserved equally compelling systems solutions and software alongside it, in order to deliver breakthrough performance to our customers in an accessible and flexible manner. This is because we at Cerebras are systems thinkers — it’s why we call ourselves a systems company. This thinking is pervasive in our ethos and manifests in our designs. The CS-1 is able to achieve best-in-industry performance through ingenuity and technical tradeoffs across software, chip and system hardware. All aspects of the solution work in concert to deliver unprecedented AI performance and ease of use.

The CS-1 is an engineering marvel — it houses the WSE, the world’s only trillion transistor processor. Powering, cooling and delivering data to the world’s largest and fastest processor chip is an exceptionally challenging undertaking. Not only does the CS-1 overcome these challenges, it does so while fitting within standard datacenter infrastructure and with industry standard communication protocols. It is 26-inches (15 rack units) tall and fits in a third of a standard datacenter rack. It can ingest 1.2 Terabits per second of data, across twelve 100 Gigabit Ethernet lanes, is powered with standard IEC C20 16A power inlets, and cooled with ambient air.

To unleash the WSE’s performance for users, a powerful, flexible software platform and familiar user workflows are critical. The Cerebras software platform has been tightly co-designed with the WSE to be able to take full advantage of its computational resources while still allowing researchers to program using industry-standard Machine Learning frameworks like TensorFlow and PyTorch, without modification. It also provides a rich tool set for users to introspect and debug, and lower-level kernel APIs for extending the platform.

All of this means exceptional deep learning performance, delivered in a truly plug-and-play configuration. Unlike clusters of graphics processing units, which can take weeks or months to set up, require extensive modifications to existing models, occupy dozens of datacenter racks and require complicated and proprietary InfiniBand to cluster, the CS-1 takes minutes to set up. Simply plug in the 100 Gigabit Ethernet links to a switch and you are ready to start training models at wafer-scale speed. Overcoming these technical hurdles across software and hardware with innovative solutions allowed Cerebras to solve the 70-year-old problem of wafer scale compute, for the first time in the history of chip design.

AI is the next big breakthrough for humanity. It has massive potential — healthcare, autonomous vehicles, commerce, climate science. It’s going to transform the way we live and work. Although researchers continue to see gains with deeper models and larger datasets, they are limited by today’s graphics based processing solutions, which are fundamentally designed for other work. Training commonly takes days, weeks, even months, constraining research and development. We need wall clock training times on the timescale of minutes-hours rather than days-weeks — even for the largest models researchers can think of. This means we need 100–1000x increase in compute capabilities, not incremental 1.5–2x. We need this performance in an accessible, easy to program package. The Cerebras CS-1 is that solution.

For more details, read the product overview. If you’re at Supercomputing this week, stop by the Cerebras booth (#689) and check out our demo. On Tuesday November 19, hear our Co-founder and Chief Systems Architect, Jean-Phillippe Fricker give a talk on the lessons we learned along the way. We’ll also be participating at several Birds of a Feather sessions.

A call to action for the industry — bring us your hardest Deep Learning problems, system and deployment questions — and we’ll solve them together with the industry’s fastest AI compute solution. And if you want to join us on this journey to change compute forever, check out our current job openings.