Original article was published on Artificial Intelligence on Medium
Some areas of research connecting minds and machines are weighty indeed but obscured from public attention.
Values and decision theory.
Our ongoing attempts to make artificial minds are starting to teach us about ourselves. When we give machines the ability to make judgments we run into now-familiar moral dilemmas. The self-driving car as executioner. Pernicious biases.
From a higher-level view, AI theorists see the above as questions about values. Before they get too powerful, we need our intelligent machine agents to act as if they had values like ours. Research currently calls the problem, AI alignment, meaning ‘aligned with our values.’
Alignment researchers prefer to work with formal, math-like systems, and a favorite is a framework called decision theory. It concerns how to make decisions about reaching goals when you’re assuming a set of values. The values are in a utility function that rates the preferences for all possible outcomes of the decisions.
Decision theories let researchers tackle general questions like: how do you formalize values so that pursuing one value doesn’t violate another? So far, decision theory shows how hard it can be to use values to guide even the simplest behavior for a powerful agent. Suppose you tell the AI to cure cancer, but not to cause any other side effects in the world. It might conclude that “no side effects” means continuing to let people die of their cancer. Instead of stopping after filling the bucket the sorcerer’s apprentice might indeed flood the castle or even the whole world.
Humans have developed ethical systems without gaining any consensus. Ethicist William Frankena made the following list of things that many people value, nearly all of which can conflict with each other.
How do we resolve the fact that human values are incomplete, contradictory, and can change with circumstances?
2004: Eliezer Yudkowsky defines ideal human values to be our “Coherent Extrapolated Volition”
Some have proposed having the machines themselves help us discover more coherent, long-term values. Others have found that the value learning problem, at least for machines, is fiendishly complex. They have also found that, if you somehow can specify the values, it is hard to see how to make a machine retain them, especially if it is smart enough to be capable of modifying itself.
Consciousness and self-modeling.
Researchers are trying to devise how to make machines that behave adaptively. The crux is that to improve how to do some task you need to model the situation in which it occurs and then use the model to evaluate (decision theory again) different approaches. For a machine operating in the real world, the situation includes the machine itself. So a flexibly adaptive machine must model its very self as part of its operating model.
2007: “Consciousness stems from the structure of the self-models that intelligent systems use to reason about themselves.” — Drew McDermott, AI researcher
But wait — there’s a strong theory that human beings do the same thing, and that our mental models of ourselves-in-the-world cause the experience of consciousness. So will we, intentionally or not, make machines that are phenomenally conscious? Would their conscious experience resemble ours in some ways (could they suffer? be happy? trustworthy? devious?) or be completely alien? Would we believe what they said to us about their internal conscious states?
1995: “The really hard problem of consciousness is the problem of [having] experience.” — David Chalmers
We all know that living a good life — and for many of us, living any life — is fraught with difficulty. We are getting a fresh look at that bedrock fact. This happens when we try to imagine safely getting help from our intelligent machines in realistic settings: where they are choice-making agents who have to model their world and themselves embedded in it. The theorists call the issue “embedded agency.”
2018: “This is Emmy. Emmy is playing real life … Emmy is within the environment that she is trying to optimize.” Garrabrant, Embedded Agents
To decide on future courses of action in a changing world, you have to predict what new situations will arise to which you must respond. But your own previous behavior will affect what those new situations will be. And likewise for your behavior before those previous situations, and so on, all the way back to Start. The more you chain together these predictions over time, the less accurate they will be. It also gets rapidly more difficult to compute longer chains of actions. This prediction issue alone might paralyze a literal-minded AI.
In the same situation, people just “go for it.” We make a decision, telling ourselves we’ll work it out somehow. We certainly don’t want a powerful machine doing that.
Furthermore, if you are an agent who makes choices in the world, then when you see the outcome of a choice you also know that you could have done otherwise. In people, this can cause feelings of justification (I made the right choice) or regret (oh, if only I hadn’t done that). These feelings help integrate us into our society, and so one theory (Frith & Metzinger, What’s the Use of Consciousness?) is that we evolved these feelings to get along with others.
Any problem-solving machine should also respond to feedback on its choices in ways that improve future ones. The first thing it should do is what we can do but usually don’t: adjust our world model based on what happened with our choice. We could also think harder about the next decision. For “big” decisions we often do that. But research says that most of our decisions are made unconsciously, and our conscious thinking merely justifies them.
A machine could improve a decision by thinking more, but there are three issues, which we can call the think-more problems. One is the chaining prediction issue mentioned above. Another is knowing when it has thought enough before taking action. The third is logical twists that can arise when trying to predict its own behavior in order to plan that same behavior.
A very advanced machine might decide to push this farther by trying to improve its basic intelligence. That idea leads to a paradox. For a successful improvement, it would need to predict what its smarter self would do. But if it could predict that, then it would have to already be the smarter self! Both the think-more and improve-me problems are similar from a computer science point of view, and researchers are trying to figure them out.
Again, faced with the issue of possible and unknowable improvement, people eventually decide to just do or not do. We can’t trust an AI with that decision.
The problem just described is named Vingean Reflection in honor of sci-fi author Vernor Vinge, who said that a writer is unable to predict the behavior of a character smarter than the writer. Other versions are: a child can’t write (or reason) realistically about an adult; humans can’t predict what a superintelligent AI would do.
We know that people can find it hard to trust their experts, because of Vinge’s reason. If we find a way to think usefully about AIs smarter than us, maybe it would apply to our own trust issues as well.
Agency and mentalizing.
Our minds are primed to explain events as if they were the result of intentions. We have our own intentions, we believe that other people have theirs, and we even assign intentions to random natural events. To apply such a mentalizing analysis to another person is called using a Theory of Mind. Our predictions are weak, but we use them to muddle through.
An intelligent machine interacting with humans would need a theory of others’ minds as part of its world model. But would an AI model a human differently than it would model a car, or a tornado? You could say that, sure, a human has goals, so the AI tries to infer the goals and use that to predict behavior. Maybe we would learn something about our own goals and behavior if a really smart AI modeled how we use them. And that’s an encouraging idea.
First, consider how an AI might model things that aren’t alive.
The AI could say that the car has goals, which are quite simple: when the piston goes down, the goal is to draw in an optimal fuel/air mixture into the cylinder. So the fuel injector implements that goal, given input from various places, like current road speed and accelerator position. When the brake pedal is depressed, the goal is to stop smoothly.
In the case of a tornado, the “goals” would be things like absorbing and dispersing heat energy from the air. Or, conservation of angular momentum. And maybe transfer of energy and momentum to whatever stands in the tornado’s way, like buildings and trees. So, what makes people different from “things” like cars and tornados?
2007: In order to explain how an information processing system can have a model of something, there must be a prior notion of intentionality, that explains why and how symbols inside the system can refer to things. — Drew McDermott
The pursuit of goals is evidence that we have intentions. But, to current philosophers and psychologists (and, originally, Franz Brentano) “intentionality” means that all our mental states are directed towards something: either other mental states or things in the outside world. In our mentalizing, we imagine elaborate “levels of intentionality”. Such as (count the italicized verbs to get the level): “Sue believes that Ralph thinks that Alan wants to join his book club, but Alan doesn’t even like books.”
A lot of philosophy and science has been written about the difference between animate and inanimate things, or things that have agency and intentionality versus those that don’t. We might clear some of that up if we build an AI that models intentionality correctly.
Another implication of such modeling takes us back to the consciousness issue. Maybe an AI that successfully models the intentions of human agents would do like we do. It would apply the same modeling principles to itself and wind up being, or at least acting like it was, conscious. Hence an ethical can o’ worms: how do we treat the AI, and how would it treat us in return?
Our attempts to understand what it would take to have aligned AIs are also a fresh look at some issues of being human. Old problems of philosophy and psychology are being approached using new tools such as decision theory, game theory, logic, and probability.
The AI researchers who study decision theory know that the extreme rationality of their theorizing could seem cold and alien, as well as hard to follow. They also know that our behavior is riddled with unconscious cognitive biases, learned prejudices, and narrow ideologies. We don’t want our lousy thinking skills transferred to our machines. When we research how they can do a better job, we might improve ourselves.
2049: First melding of multiple humans into a multi-mind. Was predicted in 2020 by an obscure informaticist.
Even if advanced AI never happens the effort might be helpful with human problems. Nick Bostrom and others have noted the parallel between advanced AIs and human organizations like governments and corporations. Both exceed the capabilities of individual humans. Both tend towards the amoral. We need a better consensus on the values for aligning them. Then we need ways to sustain that alignment, so we can trust them when their successors gain new powers.