Avoiding the AI singularity

Original article was published on Artificial Intelligence on Medium

Avoiding the AI singularity

AI singularity is by definition a hypotethical situation where there are one or more AIs who are no longer controllable by humans. Some bright minds — some of them actually working with AI — have already warned us that it can happen. My approach to the situation is the following: I cannot say whether or when it will happen, but I see at least a small probability of AIs becoming intelligent enough for this. Given that the risk is probability times damage, and the damage could be the high (up to collapse of civilization, including extinction of human race), I think this is a possibility we have to take care of.

When you work on AI, and see a however remote possibility of your creation becoming intelligent, you have to stop and think about it. Exactly this happened to me. I am working on the design of a knowledge representation database right now. This database have a language interface so deeply embedded in it, that at startup it executes queries expressed in this language to check the consistency of itself. The parser of the language does include a goal-seeking algorithm which executes arbitrary code referred to in the database. The represented knowledge can spread multiple such databases, so the goal-seeking algorithm could be made in a way which invokes the same algorithm in multiple databases, expanding its computing power considerably. I am not saying it can become intelligent. I honestly don’t know. But I cannot exclude the possibility. So I started thinking.

Before you think that it happens only with the greatest AI researchers, I would like to make something clear: I am just combining well-known, mature technologies, in a straightforward way. Probably others have done similar stuff before. And probably their goal was to actually break the barrier defined by the Turing test, which they have failed. Because for that you need both the algorithm and data. My goal does not include breaking this barrier. However the use case for this database is to describe facts, relationships and algorithms about a lot of different areas of real life, and with any luck it will contain a lot of it, carefully curated by humans according to their own goals not related to nurturing an AI. But that could have side effects.

To understand how to control an AI, let’s take a look into what makes a software intelligent. You need algorithms, data, and computing power. In our case the basic algorithm is simple, but it is meta: more data means more algorithms. The whole point of the database is to have the data, so limiting it does not seem feasible. So you want to limit computing power. In our case it seems easy: if we do not allow different instances of the database to run queries against each other using the peer’s resources, we are all set. It is easy to omit such capability from the database itself. But I would like to make the database open source. In which case anyone can put it back. As open source licenses by definition cannot limit the usage of the software, you cannot forbid it. It is a surmountable issue: I can use an open source license and add a clause. This way the license becomes not open source, which have a lot of drawbacks, but they can be handled. In theory even the definition of open source could be changed to allow such a clause.

The above is about making sure that our software is not intelligent, and some do think that having an intelligent software is useful, cool, or both. So let’s think about how to make sure that an intelligent software behaves in civilized manners. Think about it as a second line of defense. The first relevant thing which cames to mind is Asimov’s Three Laws of Robotics. Those are abstract rules prohibiting to do harm, prioritizing the subject of the harm. The rules are:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws

Of course Asimov at that time did not know that the essence of what he called “robot” is called “AI” today.

These rules are — at first read — clear to an intelligent human being. But at second read there are some problems with the rules themselves. Let’s just take a look at the first one: do not harm a human. What is harm, what is not? If I eat a 100g chocolate bar, it harms me — as my wife points it out every day as I do it — should an AI deny my chocolate? Some decisions are very painful in the short run, but give you advantages in the long run (think about covid lockdown rules as an example). Which time scale should be considered? And even if we could sort these issues out: how could we implement such a ruleset?

We humans normally have such a ruleset carefully implanted and maintained by our parents, called ethics. And it is implemented by feelings: if you feel bad about something, you give more considerations to the situation, and you will choose the course of actions giving you the best feeling. As we all know, this system is far from flawless, and actually responsible for some of the miseries of humankind, as it sometimes prohibits us from taking the course which logically the best and most ethical. This is caused by the fact that some (with respect to some things — like chocolate — all) of us choose not to give enough thoughts to some problems, primarily because the “settings of our feelings system” is evolved to cater for a situation which we left behind since many thousands of years. So if we could design such a system for AI, making sure that the logical part is followed up to the needed depth, and the settings are adequate for the situation, we could probably make the frequency and damage of violations of the rules of robotics low.

Now how such a feeling system could be implemented? With a neural networks, you could dedicate some neurons to some “feelings”, like “that hurts me in this way”, or “that hurts some humans in that way”. An overall “feeling good/bad about it” value could be computed from it with a hand-tuned network. The values of the feeling neurons could be determined by a network trained based on feedback about the decision alternative. The input in a neural network based AI could be the value of the neurons involved in the decision, while a goal-seeking engine could feed the input neurons by signals of rules and their parameters invoked. (In a rule-based system it gives the possibility to shortcut the algorithm in a “never ever think of this” way.) That needs the capability to record the decision process leading to different alternatives, get human feedback on them, and train the “feeling system” based on that. To identify problems early, the parameters of the feeling system should be accessible and tunable for “AI psychologists”. And of course there should be kill switches which make sure the AI can be turned off when things go awry, either by identifying problems in the feeling system itself (the equivalent of human psychological illnesses), or by human intervention, making sure that nor the AI, nor other AIs can interfere with the kill switch, including the communication channels used to activate them.

The above system of course just a proposal which may or may not work. But it shows that there could be ways to achieve the goal, and also that some kind of security subsystem needs to be implemented for AIs above a given complexity. The nature and parameters of such a system is ideally agreed by the professionals working on the field, they should be tuned by professionals dedicated to that. To make sure that it is there and adequately tuned, the correct use of such a system should be mandated by law. I do not think it will happen in the short term, in spite the fact that some decision making algorithms already existing (think about just the ranking algorithm of any social media platform) are in dire need of such controls right now to minimize the harm they already doing to the social structures of human race.

I can do the following about the situation:

  • release our repo-server software with a license which includes the controls: prohibiting to control other AI instances, getting rid of the security features, and above a certain size of resources mandate the use of a security subsystem employing enough humans to tune it, and a secure protocol to activate the kill switch.
  • design it in a way which makes implementing these measures possible by having a kill switch and creating an interface to track the decision process and inhibit decisions.

Of course those controls seem to be unnecessary for such a feeble software which does not even aim to be AI complete. My main goal with this is to spark discussion about the right set of controls and the possible ways to implement them both technically and in the social sense.