Source: Deep Learning on Medium
Why AI assistants should become AI avatars
Welcome to Issue ✌️of Embodied AI, your bi-weekly insights on the latest news, technology, and trends behind AI avatars, virtual beings, and digital humans.
But have you ever wondered why Alexa, Google Assistant, or Siri don’t come with a pair of eyes? This week we invite you to imagine a world where your virtual assistants are morphed into AI avatars with eyes 👀 and see how they can serve you better.
👉 Sign up here to subscribe to our newsletter!
🎤 “Oh (why) can’t you see?”
One reason that AI assistants don’t see is because building a voice AI product is itself a daunting and resource-intensive task. Amazon currently has over 10,000 employees working on Alexa and Echo devices, and Facebook has flat out failed to build one, shipping its Portal with a built-in Alexa.
Another reason has to do with the long-standing concerns over data privacy and surveillance. After all, it’d be creepy to have a camera monitoring every breath you take, every move you make which isn’t as romantic as the song makes it seem.
These two factors are contributing to the lowering of the “bar” of how people imagine the possibilities of virtual assistants, which might be why many of us are comfortable simply engaging in a command-query interaction with Alexa. But as technologists and innovators, aren’t we supposed to be thinking a little bigger? How about building AI capable of a two-way, interactive communication?
👀 Do you want eyes with that?
While the creepiness associated with a watchful AI assistant is often a design issue solvable by measures like comprehensive legal regulations (like GDPR), more efficient edge computing (what happens at the edge stays at the edge), and building anonymized and secure AI training mechanisms (such as OpenMined), the benefits that come with a seeing AI assistant are crucial for a humanlike interaction and an enriched user experience.
Eyes are the window to a digital being’s soul
- Gazing eyes immediately hold our attention of another person and make us more conscious of their mind and perspective.
- We tend to perceive people who make more eye contact to be more intelligent, conscientious, and sincere, at least in Western cultures.
- We rate strangers with whom we’ve made eye contact as more similar to us in terms of personality.
Jarrett concludes that eye contact is perhaps the closest we will come to “touching souls”. If we strive to create natural interactions between AI and humans that create trust, then embodying AI with eyes is essential.
So yes, we want fries *ahem* eyes with that!
Skill discovery is easier when AI assistants can see
Besides soul-touching, AI assistants that can see can help people discover their own special skills. In a recent blog post on Alexa, a16z’s Benedict Evans notes that survey data shows people mostly use virtual assistants for audio activities like music, podcasts, weather forecasts, and kitchen timers, plus trivial questions and smart light control. But shouldn’t virtual assistants be able to do more than that?
They should (and could), but Evans also points out a paradox: the seemingly flexible and free-form audio-only interface is highly limited in functionality. Would you listen to Alexa listing out all her 70,000 skills so that we can get the most out of our virtual assistants?
Now, imagine an interface or even an operating system that comes with computer vision. Why recite or list your skills via audio when you can proactively offer your service by seeing and understanding intuitively what the person needs (paired with a screen)? AI assistants with advanced action understanding capabilities can interact with humans actively while making their lives seamless and more productive.
🤖 AI assistants envisioned as AI avatars
“In the mirrorworld, virtual bots will become embodied. Agents like Siri and Alexa will take on 3D forms that can see and be seen. They will be able not just to hear our voices but also see our gestures and pick up on our microexpressions.”
They will, essentially, become AI avatars. Powered by voice AI and computer vision, avatars will become the primary agents humans engage with when interacting with all interfaces. Our devices will turn on with a simple gaze. By understanding our actions, such as repeatedly scratching the skin, they know to increase the humidity in the room. By understanding our mood changes, they know which of our favorite songs to play on Spotify.
With eyes, our virtual assistants no longer serve as our virtual slaves but instead transform into intelligent avatars that engage with us in a “soulful” manner. They can be emotionally connected to us through gazing and effectively serve us through seeing, while opening a new chapter of human-machine interaction.