Original article was published by Davide Camera on Artificial Intelligence on Medium
NVIDIA Announced Maxine: Improve Video Conferencing with GANs & AI
A cloud-AI video-streaming to boost bandwidth performance
Thanks to COVID “new normality” we are all experiencing, tools such as Meet, Zoom, and Teams for remote communication have become daily allies in the field of productivity.
These are undoubtedly advanced technologies that have been significantly improved in recent years to meet the needs of those who have been more or less suddenly forced to work in smart working mode, but there is still much room for improvement: a gap that NVIDIA intends to fill with the Maxine system.
According to NVIDIA, Maxine is a fully accelerated platform for developers to build and deploy AI-powered features in video conferencing services using state-of-the-art models that run in the cloud.
Applications based on Maxine can reduce video bandwidth usage down to one-tenth of H.264 using AI video compression, dramatically reducing costs.
“Video conferencing is now a part of everyday life, helping millions of people work, learn and play, and even see the doctor,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “NVIDIA Maxine integrates our most advanced video, audio and conversational AI capabilities to bring breakthrough efficiency and new capabilities to the platforms that are keeping us all connected.”
Maxine includes latest innovations from NVIDIA research such as:
- face alignment
- gaze correction
- face re-lighting
- real time translation
- noise removal
- closed captioning
- virtual assistants
For example, face alignment enables faces to be automatically adjusted so that people appear to be facing each other during a call, while gaze correction helps simulate eye contact, even if the camera isn’t aligned with the user’s screen.
With video conferencing growing by 10x since the beginning of the year, these features help people stay engaged in the conversation rather than looking at their camera.
These capabilities are fully accelerated on NVIDIA GPUs to run in real time video streaming applications in the cloud.
As Maxine-based applications run in the cloud, the same features can be offered to every user on any device, including computers, tablets, and phones.
And because NVIDIA Maxine is cloud native, applications can easily be deployed as microservices that scale to hundreds of thousands of streams in a Kubernetes environment.
The announcement at the GTC (GPU Technology Conference) the NVIDIA digital event for developers, engineers, researchers, and innovators looking to enhance their skills, and gain a deeper understanding of how AI will transform their work (October 5 — October 9, 2020).
How Maxine Works: NVIDIA wants to improve remote communication
A mix of cloud and neural networks (GAN | Generative Adversarial Network) applied to video conferences it’s able to intervene on different aspects of video transmission improving, among other things, the compression of images, thus meeting the needs of those who do not have a high-performance connection, ensuring that the interlocutors always have the feeling of looking into each other’s eyes, enabling real-time transcription and translation of what is said or automatic noise cancellation, improving resolution, replacing the background with an image at will or even replacing the user’s face with a three-dimensional avatar.
The mechanism behind Artificial Intelligence video calls is simple.
A sender first transmits a reference image of the caller, then, rather than sending a fat stream of pixel-packed images, it sends data on the locations of a few key points around the user’s eyes, nose and mouth.
A generative adversarial network on the receiver’s side uses the initial image and the facial key points to reconstruct subsequent images on a local GPU. As a result, much less data is sent over the network.
You can enjoy a demo of Maxine in action in the video below.
The approach is part of an industry trend of shifting network bottlenecks into computational tasks that can be more easily tackled with local or cloud resources.
“These days lots of companies want to turn bandwidth problems into compute problems because it’s often hard to add more bandwidth and easier to add more compute,” said Andrew Page, a director of advanced products in NVIDIA’s media group.
Some of its features are already implemented (e.g., transcription and background replacement), so what NVIDIA wants to do is to bring all of these features within a cloud platform offered to developers and software houses to take advantage of the power of the GPU.
Developers can apply for early access to start working on integrating the platform with their services and apps.