Original article was published on Artificial Intelligence on Medium
What might be required to build an Artificial General Intelligence
What sort of knowledge and breakthrough do we need in order to reach it? How intelligent is an AGI?
Don’t need to worry this post is not about how AGI is going to take over the world or how humanity will merge with AGI. But rather what I think is needed to achieve it. This post is just a thought experiment I have after reading some books and watching some fascinating video on AI. This is my take on AGI, it definitely won’t be correct but hopeful this post will act as a spark to trigger thoughts to my readers.
Imagine we are going to build an AGI, what sort of knowledge and breakthrough do we need in order to reach it. How intelligent is an AGI? Why is this important you might ask? Simply because a highly intelligent being far superior to a human will alter humanity greatly, be it good or bad, our life and society will change and never be the same again. So how do we measure an AGI, can we give a comparison scale on this intelligent being? Maybe 100 times smarter than Albert Einstein or Stephen Hawking?
How smart or intelligent is an AGI?
One of the mind-blowing videos that I have watched is Neil deGrasse Tyson talking about what intelligent species will be like. I recommend watching his video shown below.
Looking at our closest relative of human beings are the Chimpanzees. We share ~99% of identical DNA with the Chimps. Anyone without a doubt will consider human much smarter than a Chimp, as we could build rockets to Mars and smashing atoms. Chimpanzees, on the other hand, cannot do any of that. And yet we share ~1% different in DNA, what distinguishes us from Chimps is in that 1% difference. Now Imagine an intelligent species that is 1% greater than us, we human will not have the ability to even comprehend. Any simple work (example like multiplication, kids poem) that is done by this highly intelligent species, the entire humanity with given a thousand years will not be able to understand them.
Imagine teaching Chimps 15 * 0 = 0 or understand Twinkle Twinkle Little Star. Chimpanzees have been tested to understand and do simple single-digit addition. But they will have a hard time understanding the concept of 0 and multiplying numbers into the hundreds is just too much for their brain to process. What about kids poem? It may look very easy for us but this is an NLP problem. They need to understand sentence structure and meaning of words. Words will have a relation to things or behaviour to our world. Chimps could do hand signs and recognize a set of symbols but to use symbols to represent flashing lights in the sky and ponder about stars when gazing them in the night sky is still probably a long evolution far away to achieve. Their brain is just not evolved enough to grasp these pieces of information. Likewise if an AGI of 1% intellectual difference, it would probably take us many centuries of evolution to understand them.
A basic introduction to Neocortex
Neocortex in human brain account about 76% of the brain volume, it will continue to develop from us being a baby and until we are 26 years old. It is located at the top layer of our brain with ~2-4 mm thick, it contains 6 layers and stretches out about the size of a large table napkin. Surprisingly neocortex is found to only develop in mammal and not in other species. Neocortex holds the most advance functions, we could also say this is what made us intelligent and what makes you, you. It provides us with sensory perception, spatial reasoning, social-emotional processing, memory, learning capability and etc.
If we take slices of neocortex randomly from the brain and put them under a microscope the cellular slices will look identical no matter where we cut it out from. With this finding, we estimate that all the things we do, visual, auditory and language skills etc are using the same cellular structure to do these tasks. This tells us that with all the complex tasks that we humans can perform can be processed and solved using the same cellular structure.
The neocortex is divide into several regions and is connected together through a bundle of nerve fibres that go through white matter. This creates a logical link and at the same time forming a hierarchical structure to allow sensory to be processed at different levels. As sensory data comes from the eyes and ears, this hierarchy will generate a more general abstraction representation as it goes up to the top layer of the brain.
Duplicating human intelligence
You might think of why not duplicate the brain exactly onto silicon chips and we will get an intelligent machine. But I would argue that will not work. I believe we don’t need to totally duplicate human brain functions to create AGI. But we need the underlying mechanism neural dynamics of the brain similar to aerodynamic of aeroplanes to create intelligent machine. We do not need flapping wings and feathers to create a flying machine. What we need are thrust and drag to lift a plane in the air.
How learning should be for machines
1. Temporal and life long learning
Our learning journey starts very early stage in our lives, human develop hearing during the last trimester or 3 months into pregnancy. Babies can hear in their mother’s womb and response to their mom voices and conversations outside the womb. It has been tested by the 3rd-trimester baby can already recognize the mother’s voice and could respond to them with an increase in heart rate suggesting they are more alert when the mother speaks.
Our brain is constantly receiving inputs from our senses, sight, hearing, touch, smell, motor movement from all limbs, etc. These are all continuous data as our brain process them make predictions and learn from trial and error. Like how a baby learns to crawl and walk.
With the examples above, I pretty convinced that our brain is learning in a temporal manner and is continuous in a stream of actions. It is definitely not learning in batches and not in 2 phases of training and prediction manner. We are constantly training and predicting at the same time, repeatedly strengthen the activation of the synapses until we can do it subconsciously without actually thinking about it. (An example in language, a bilingual speaker often have native words pop up first then they think of the translation of that word.)
What about current deep learning methods? We do have continuous learning algorithms. Like RNN, reinforcement learning (RL), DeepMind’s DQN they have somewhat similar learning through a temporal representation. But we still need many hundreds of hours or even years of training for the model to perform as good as a human. DeepMind’s Deep Q-Networks (DQN) took 900 hrs to learn to play Atari game Frostbite and took 200 years for AlphaStar to defeat the StarCraft II elite human player. If only these AI could learn like a human and does not need to drive a car down a cliff thousands of time before knowing that it is a bad idea. So for a machine to integrate with our environment, we need an RL (temporal) type of learning algorithm at the same time do it in few-shot learning style, in order for them to react to ever-changing events in the real world.
An idea to achieve fast convergence of a model on the above problem we may need to look at life long learning. Life long learning is an idea that allows a machine to learns continuously, accumulates the knowledge learned in the past, and use it to help future learning and problem-solving. I believe this is an important milestone that we need to cross in order to achieve AGI. When life long learning is achieved, we could potentially reduce the time required to train the model on a new task. It will also be flexible to learn new things as and when they appear in real like. Without the need to retrain the entire model to fit the new situation that arises.
2. Having memory
For a human to retain long term memory requires us to constantly repeat these 3 processes, attention, encoding, retrieval. Attention refers to maintaining focus on a task, to plan and regulate our thoughts and actions. After we have the task in our working memory (short term memory) we can then encode them into long term memory. Finally, retrieval simply means recalling the information previously we took time to encode. The more we retrieval this information the stronger we remember them or we can say that it took less effort to recall them.
There are times we memorize some events immediately without us aware of, this is called implicit memory. Implicit memory (episodic) usually happens in associations with unique events, like our first kiss, having a romantic moment with someone special or even going for a dental visit to have your wisdom tooth plucked. This type of memory does not require us to do it repeatedly to encode them. Do you ever feel like at such moments time feels so different than normal?
With current machine learning, we have LSTM using gates to control the memorising process and all these information are stored as weights. They work really well especially on language models such as ELMo, BERT and ULMFiT using Transformer architecture. But how ML storing memory using weights are very different from our own memory. New memory we created will not corrupt other memory stores in our brain, at most we will forget them or they become burry when recalled. But we will not corrupt memory making them totally different. An example, we won’t see a bus but recognise them as an ostrich when learning different task. For machine learning, we need a way to store newly learned task but not affect other weights. It should form a new “synaptic connection” and not affecting others.
This is a case of what we call catastrophic forgetting, this happens when a deep learning system forgets previously what it had learned after it was trained on a new task. In an example, when you have trained a model on Task A and using the same weights for learning a new Task B, then your model forgets learned information about Task A.
Why did we forget?
A TED talk was given by Richard Morris which touches the topic about forgetting. Forgetting is a necessary mechanism of our brain, this allows our brain to capture information for some period of time to work out what is worth keeping and what is not. For that unimportant information will be offloaded from your brain, this prevents us from overloading our memory with unnecessary information. In terms of machine learning, we would probably need to have an architecture that is designed with working memory and long term memory in mind. Important information should be store in long term memory and during training, it should be done in working memory.
(Richard Morris Ted Talk)
If you are interested in the above experiment, you can click on the link below to watch Mind Field S3.E1 by Vsauce.
3. Modelling the world
When a baby is born they will start to learn about the world and making references on things they saw. They will start to learn about gravity at 9 months old. A Baby will be shocked if you trick them if a toy float in the air and not dropping. They have formed an expectation pattern of the world and a relation between objects. These expectation patterns and the relationship between objects become part and parcel of our life, our brain will make prediction subconsciously with the expected outcome and will be shocked and becomes alert when these expectation patterns are broken.
Not just a big difference in expectation will be pick up by you but also slight changes in expected patterns. Imagine you are opening your house door and you apply the same strength as usual but this time it slams backwards. As you open the door you immediately notice the door has somehow become lighter and subconsciously predicted the door is going to slam backwards given the strength you apply to it.
If you do not know what you don’t know, you are not able to learn.
Current machine learning does not form any relationship between objects and they are having a hard time identify abnormalities in their prediction. Take an image classification task, a model that learns about cats and dogs. When given the classifier image of a hamster they cannot output “I don’t know what is this”. Yes, you can argue we can use abnormality detection techniques to identify hamsters. But these techniques are “hand-coded”, it is not learned by the model on what abnormality are. Only with some references, something to compare on, a machine could that then pick up the difference and ask why is it different from its model prediction.
4. General learning structure
Another idea that I think we might need to achieve in order to reach AGI is having a general learning structure on artificial neurons. This idea came from a book called “On Intelligence”, the book explained that our neocortex has a similar structure throughout even if you cut them in a different location. This might be an important property of the brain. Having a single structure with the ability to learn different types of input like image, audio, text, temporal information, objects relationship and reasoning.
Rather than having a structure for each task. Could we design such a way that we let the machine learn what is needed for the given problem? Imagine having a general neuron design that can do all these kind of tasks and form new connections that are required to complete them. This is very close to our brain of having the flexibility to handle difficult situations. It will allow the machine to creates a potential novel design through generic structure building blocks through learning. Like having the capability to form connections for NLP, image and relationship in them.
5. Evolution based species
What is evolution? It is the Darwinian explanation of the origin and variation of species, and the selection of the fittest. It is a powerful explanatory force which, acting over long periods of time. (Explanation from GPT2)
Evolution is not an effective process, in fact, it is wasteful if you think about it, evolution is based on trial and error by selecting the survival of the fittest while filter away a portion of the population. Quoted from Richard Dawkins “Evolution is deplorable wasteful, it’s fundamentally based on waste but on the other hand it does produce magnificent results.”
An example of how evolution is not efficient given by Richard Dawkins that can be found in our body. Our body the laryngeal nerve is a particularly bad design by evolution, it is a nerve that connects to our larynx (voice box). It starts from the brain but this nerve does not go straight to the voice box, it goes through the chest and loop around the artery and backs up to the larynx. We are not the only one with this flawed design, the giraffe has its laryngeal nerve that loops around to the chest that detour of ~1.8 meters. Although evolution is not efficient yet its the only way we know that is proven that could lead to an intelligent entity.
Unlike biological evolution, digital evolution can supercharge the evolution process. With the example of the OpenAI project, simulating hide-and-seek environment by running agents in many parallel instances, agents could achieve complex intelligent behaviours. What if an agent could re-write it’s code or rewired it’s neurons connections to improve itself. It could grow exponentially in contrast to biological counterpart, digital intelligent can easily outgrow human intelligence in mere months of effort.
Evolution might be a glimpse to answer Moravec’s paradox
Moravec’s paradox discuss the phenomenon surrounding all AI development, where it is much easier to implement specialised computers that mimic expert human on “hard” tasks (Chess, Go, painting and music) than building a machine with skills of a 1-year-old child. High-level reasoning requires less computation than low-level unconscious cognition.
“It is comparatively easy to make computers exhibit adult level performance […] and difficult or impossible to give them the skills of a one-year-old.” (Hans Moravec)
Using evolution through reinforcement learning (RL) in AI simulation, it is possible to build a machine to walk and stacking toy blocks similarly like what a 1-year-old child is learning to do. OpenAI has another project that proves dexterity by solving Rubik’s cube through simulating in a virtual environment can be transferred to the real world. This shows the possibility of Moravec’s paradox could be overcome but that said, the requirement of computation has risen exponentially as well. From AlexNet in 2012 to today, there’s been a 300,000x increase in computation.
While it is counter-intuitive, that high-level reasoning requires lesser computation. In comparison, looking at our brain where high cognition happens, our neocortex is less dense than our primitive cerebellum (estimated 20 billion vs 120 billion neurons). This may imply that the main bottleneck in creating AGI might not be computational but rather algorithmic one.
6. Affection computing
Affective computing is the study of the development of systems that can recognise, interpret, process and simulate human affects.
Finally, I believe this is the final touches on what AGI should have. The ability to compute feelings and affection. I believe this is critical because we do not want a machine that is cold to interact with, that does not feel nor express feelings. Imagine you are talking to Alexa, but this version of Alexa is different, it is able to detect stress level through your voice tone and know that you are asking to buy a present from Amazon. Knowing that your stress level is unusually high, you have forgotten your wedding anniversary. Rather than showing you what you are looking for, it suggests that you should buy flower instead, at a nearby local florist that can be delivered in the next 2 hours. It will feel like a real personal assistant that is useful.
Affective computation could provide emotion contextual services that cater to your current needs. As scary as it may seem to be, affective computing could potentially save a life when done right. Words that we spoke of maybe the same but the tone that surrounds the words makes a huge difference. This could also be an important piece of information for the machine to consider before it takes action. Even if the machine does not truly express feelings but just by providing a simulated response can provide comfort to its user.
7. Intelligent vs Consciousness
There is a lot of debate between intelligent and consciousness. In my opinion, we don’t need consciousness to create an intelligent machine, we have living examples like dog, dolphin and monkey. They certainly have self-awareness and consciousness but they definitely not as intelligent compare to human. So we could probably create an intelligent machine that is not conscious. Consciousness might not be an emergent property of Intelligence.
Maybe there are problems that fall under the category of Heisenberg uncertainly principle, the more we want to measure or pinpoint it the more uncertain it becomes. Consciousness may fall within this category that ties with quantum mechanics which prevents us from getting the truth and escaping the matrix.