Original article can be found here (source): Artificial Intelligence on Medium
Video makes it tricky
The main reason is simple: a video is an evolving item, in which each image is linked to another. It is one thing to understand an image. It is another to make assertions about causal links existing between millions of images in the same content. The gap between both is tremendous. And explains why video understanding is not such an easy feat.
Video understanding can only exist if two conditions are met: automation and causality.
Causality can be summed up by a simple example: in a video you’re watching on your favorite social platform, two people are kissing. If you take single images of the scene, you’ll think that there’s nothing harmful. They’re kissing, duh. But now if you analyse the whole scene, you’ll notice that before the kiss, the guy is acting roughly with the girl, pushing her a bit, blocking her way and that after the kiss, emotions detected on the woman’s face show fear, disgust and sadness much more than love and happiness.
Instead of a “simple” kiss, you might have to deal with sexual harassment, letting you to believe that this has nothing romantic and should even be reported. Your perception of the scene has fully evolved. Of course, you did not have the context. Neither did your algorithm.
That simple example leads us to understand why fully automated video moderation represents a much bigger challenge to those that did not invest (or so few) into such solutions yet. Apology to terrorism might as well be hidden in a “comedy” video. A sex scene might occur in the middle of an apparently safe program. And the presence of weapons could be discovered in the middle of a tutorial.
Add to this the volume of video content to process and you’ll have the reason why fully trusting their moderation A.I is a leap of faith for the tech giants.