Source: Deep Learning on Medium
Worlds First ‘Intelligent’ Vision Created
And No! Intelligent Vision doesn’t work at the level of pixels. And doesn’t use matrix operations or gradient descent.
For decades we tried to build Vision systems in the image of man. And we took the Neural Networks Route. We made some unprecedented breakthroughs in what we could do. Yet we haven’t achieved ‘Intelligent’ Vision. What we have is something very mechanical, devoid of logic and common sense, very unreliable, which fails randomly and very severely.
It is true that there is No One! Definition of Intelligent Vision. But we know that Intelligent Vision is …
- Translation Invariant
- Rotation Invariant (In 3D Space)
- Scale Invariant
- Deformation Invariant
- Color Invariant
- Texture Invariant
- Luminosity Invariant
We have tried to combine all sorts of tricks with our Neural Networks… Structure…Capsules…Multi-Resolutions…Color Palettes…all sorts of Filters. But all these can only go so far.
We have stuck to our Neural Networks. And our efforts have been to mix it with various techniques to try and make it better, in vain.
There is a reason the Eye is a part of the Brain.
Contrary to what everyone Believes, Brain uses Intelligence for Vision. It uses Logic. Thats how it sees things.
First of all the brain collects pixels together into lumps/shapes/regions. This is exactly like segmentation.
And then it starts ‘logically’ tying them up together.
It is round, red, soft…. must be the kids football. It is long, like threads, in something that looks like a plate, and fork. Must be noodles.
It looks like purple color, soft like cotton. Must be kids handkerchief.
Thats how the brain see’s and recognizes things.
This is also how we Hear things. Recognize Voices, Understand Speech etc.
This is absolutely the opposite of our current approaches. Only the first layer deals with pixels and once it segments it, Nothing beyond it in the brain see’s pixels or anything based on pixels. While in neural networks everything works with pixels, and higher dimensional pixel features.
So what have we achieved?
Something very basic. Yet completely different from The Vision systems we have today.
Our Front End is a Black Box (Eye) which lumps pixels together.
The Second Stage is basically where a Lot of Hypotheses are Generated Based on our Existing Knowledge. Round, Grey, Soft, Hard, Light, Heavy,… Is it a table? a spoon? is it …?
The Last Stage is (The Brain) — the solver. Which logically ties everything together by solving the puzzle.
And no it doesn’t Recognize 1bn citizens with 95% accuracy yet. And it cannot be used to Drive an Autonomous Car yet. But what we have created is Intelligent Vision.
This is our Website http://automatski.com