GPU Inference on the Rise

Source: Deep Learning on Medium

Go to the profile of NVIDIA AI

AI algorithms are drawing insights today from huge swaths of data across a wide variety of use-cases, ranging from image search, to real-time voice-driven services, sentiment analysis and recommendation engines. They have enabled researchers and companies to gain, deeper insights, delivered in less time. This evolution has taken training times from days to minutes and researchers continue refining sophisticated techniques that solve knotty problems using multiple networks in combination.

Instant Object Identification with GPU Inferencing

As researchers continue to push the boundaries of what’s possible, networks and data-sets continue to rapidly expand. At the same time, many new services have real-time requirements, needing answers delivered in a matter of a few milliseconds. The demand for accelerated inference is being driven by these two trend-lines. And while training tends to focus on throughput and time to converge on a specified accuracy level, inference (also called prediction or scoring) is where AI really goes to work and has additional considerations beyond throughput that include latency, efficiency, programmability and accuracy.

But how do organizations get started with deep learning? What does the right solution look like? Can IT managers easily deploy data center accelerators into their existing infrastructure? When looking to spin up AI acceleration capabilities, there are a lot of questions. And quite often the answer is: “it depends.”

Work Backwards

Getting to the bigger answer is like eating a 5-foot sub sandwich. How do you eat something that big? One bite at a time. To begin with, start with a real business problem your organization is grappling with, then work backwards from what an effective, even breakthrough, solution looks like. These problems often overwhelm your current compute capabilities that can’t provide anywhere near the needed horsepower to take on this problem. Also, think about the full scope of this task, start to finish. For example, deep learning has two primary steps: training and inference. You need to account for both, since you can’t do inference without a trained network, and that trained network isn’t worth much if you can’t inference new data on it to gain insights.

Many data centers standardize on one or just a few server configs for ease of maintenance and day-to-day management of their server fleets. Accelerator-powered servers represent a new configuration to be integrated into the fleet, while also bringing additional power requirements. Mapping back to the total solution, your team needs to be clear on the likely mix of training and inference workloads, and plan accordingly. Teams who want to retrain networks often — some companies already retrain their networks multiple times per day — will need high-performance servers to quickly get that work done. However, if the majority of the AI work being done is inference with only occasional retrains, more mainstream solutions can be considered.

The above analysis assumes all the servers are on-premise, which for many businesses is a necessity due to sensitive data policies, regulatory concerns or security considerations. However, cloud computing is a very viable option to quickly scale compute capabilities and get teams working on proof-of-concept designs at very reasonable costs with no additional infrastructure spend. For some organizations, an all-cloud deployment can be the right solution, sometimes as a bridging strategy while on-premise compute capabilities are upgraded over time. Additionally, a hybrid approach with some acceleration on-premise, followed by a “burst” to cloud compute on an as-needed basis may be the right answer. Again, it depends.

Inference is where AI really goes to work. And for organizations making big bets on accelerated servers, having a platform that is at once performant, versatile, programmable and efficient can be a hard balancing act to achieve. AI is not only inherently complex, it’s also a moving target, given its rapid rate of innovation. As such, it’s critical that any AI accelerator be able to not only accelerate today’s tough data-driven problems, but also serve as a laboratory for developers to work on the next set of challenges.

If your organization is currently thinking about how to build acceleration into your data center and deliver AI-powered services, join us for our upcoming Inference Webinar, Putting Deep Learning to Work: Data Center Inference on April 17 where we’ll cover AI trends, deployment challenges, customer case-studies and how to develop a strategic approach to building out your next-generation data center.