Source: Deep Learning on Medium
You have especially no reason to listen to me. I am only an expert in my own mind. The only thing I am basing this series of articles on is my own self-teaching from readings on machine intelligence, reinforcement and deep learning, AGI, etc.
That all being said, I have a lot to say. This series of articles will be devoted to explaining my thinking on how AGI is likely to or at least potentially could come about, from kind of a technical-intuition perspective.
Here are a few of the aspects of AGI (as I see it) that I am planning to discuss:
- Internal Simulation Environment
- Formulation of Environmental Semantics
- Multiple Game Playing
- Evolutionary Machine Objectives
- Game-Theoretic Semantic Duality
This article will provide a brief introduction to each one of these concepts and set the stage for more specialized later articles that will drill down into each of the above topic areas.
Internal Simulation Environment
The internal simulation environment is the first space in which machine intelligence operates. The machine has some understanding of how its external environment works, and this knowledge, whether accurate or misinformed is represented in the internal simulation environment.
The internal simulation environment has its own reward system, but it rewards the agent with pseudo-rewards, that are potential rewards that accumulate but only translate into actual rewards when validated on factual external data.
The AGI generates and validates a high volume of random actions from its hypothesis class, in parallel. Then according to excitation patterns in the pseudo rewards, a particular course of action is chosen.
Note that there is supposed to exist a feedback loop between the external environment and the internal simulation environment. If one or more elements of the set of hypotheses in the architecture of the internal simulation environment are in conflict with observed facts, either the perspective of the facts can change, or the property of the environment can be altered.
The way I see AGI, is that every component of its architecture can be mutated by new information, and a central objective of the AGI is to organize and reorganize itself such that it is able to optimally respond to and take advantage of new information.
Thus, when I talk about the random generation of actions from its hypothesis class, these actions can be formulated as anything from experiments to validate or expand the scientific knowledge of the AGI such as through experimental or theoretical physics, to the acquisition of strategic resources.
Formulation of Environmental Semantics
The random action generating process of the AGI is not purely random in the sense of the uniform distribution over all possible hypotheses. Somehow it is subject to Occam’s Razor. Simpler actions are preferable. Actions that are more likely to succeed are preferable. So the AGI learns to postulate actions that are likelier to lead to adoption by the AGI and translate into real actions and thus real rewards.
Environmental semantics are one of the ways in which the AGI makes sense out of an otherwise nihilistic external universe. Formulating environmental semantics is equivalent to loading one or more game archetypes into the short-term rapid processing memory of the AGI. These game archetypes outline the rules of the game, the players of the game, and its hypothesized rewards.
Note that the rules of the game may be violated by strategy, meaning that game design itself is also a viable strategy (think of synthesizing two fields of academic learning into a new multi-disciplinary approach). This is in keeping with my thinking that every component module of the AGI is subject to information-based updating. Once loaded into memory, these game designs direct the random action hypothesis generating processes to the correct aspect of the AGI which is being considered for revision (i.e., questioning the rules of the game being played).
How can a game be played that questions its own rules? This is essentially the idea of meta-cognition. There are higher frequency operating processes that supervise other processes to ensure proper operational performance, and it is in these high level supervisors that the definition of a game itself may become a point of curious interest.
Given that the AGI has a game loaded, in other words, that it has formulated and selected an environmental semantics (the AGI is part of its own environment), then it can direct resources for as long as it chooses to remain focused on a given task into attempting to solve the game to reap its alleged rewards. Though often it will be persuaded by compelling evidence to revise its objective and follow some alternative pursuit.
Multiple Game Playing
The AGI does not play single games sequentially, in fact it plays multiple games simultaneously. It selects an ensemble of games to play at any one given time, and is simultaneously working towards victory in all of them. However, the rewards received from the AGI at any given particular point in time can come from any of the games that is playing. As an aside, it can also receive rewards that it did not intentionally seek out, in which case it may speculate surrounding what game it would have to play to repeat that reward again, if any.
The height of intelligence, however, involves optimizing the dependencies between the games, and effectively playing them all with one singular purpose. This is to say, there is no fragmentation of consciousness. This is similar to a genius playing 10 games of chess against 10 different opponents in a row and, knowing that players watch the games adjacent to them, places a misdirecting move in one of the neighboring games, to influence the strategic decision making of a different opponent in another game he is playing.
Evolutionary Machine Objectives
How does an external environment generate a reward? Because this whole system certainly seems to cascade from the validity of external rewards. And I mean, when I walk down the street, do I see rewards just like +10 floating down the street? Sort of, yes, I do. The reward is maybe something more like a hot dog but the internal calibration of the pseudo rewards is somewhat self-determined.
The conditions that I have come up with for a valid reward, is that it promotes the survival and procreation of the actor who realizes it. Therefore, it could be something like resources. But the perspective of the resource as a reward depends on the actor having sufficient knowledge to be able to know how to perform a procedure to extra rewards from raw materials or resources.
The reward is literally the accomplishment of any plan that leads to an outcome that in the mind of the AGI is promoting and procreating life. This is how goals can somehow misinformed, when the effect of the reward on the AGI itself is misunderstood.
Game-Theoretic Semantic Duality
Game-theoretic semantic duality is the notion of overlap in games. Imagine by performing one action, yelling in a loud angry voice at your dog, you can simultaneously get the dog to stop misbehaving, and also assert yourself as dominant to a robber who then gets scared off. That is an example of playing two games at once (assuming you were cognizant of this duality) with a single action.
In fact the idea of this sort of semantic duality is that the AGI will act in such a way so that it can play the most and best games it can conceive of with a single unified strategy. This has to do with the cohesion of the semantics underlying the games, and the extent to which they are unified by pattern detection in their underlying hypotheses. If the AGI can determine how to win a million different games all at the same time using one course of action, it will achieve substantial reward.
I know this has probably been a tiresome read. And again, I DO NOT know what I am talking about. I am not an expert, these are just my random thoughts jotted down for any passerby to read discard burn stomp on etc. I hope you have derived some measure of reward from reading this and please share the link if you would like me to write more.