Original article was published on Artificial Intelligence on Medium
This raises three more questions:
Does the person inside the room understand Chinese?
Do the manuals of instructions and books understand Chinese?
Does the whole system understand Chinese?
The Chinese room can pass the Turing test, as we have said. Does this mean it can think then? This is what the experiment tries to dismantle by defending that thinking and understanding should go hand by hand, and therefore a system that can pass the Turing Test is not doing anything similar to thinking if it is not understanding.
The Legacy of the Turing Test
To finish off, let’s explore some of the most recent versions of the Turing test, along with when and how it has been passed, and the conclusions from such experiments.
ELIZA was a conversational bot, developed in the MIT in 1966 by Joseph Weizenbaum, being one of the first programs capable of processing natural language. This program, to which we could chat through a messenger service, mimicked a psychologist, encouraging the person at the other end to talk more and more about their problems.
It used sentences like ‘I see’ or ‘tell me more about this’ (see the last line of the following conversation), and repeatedly bombarded the human at the other end with questions, like shown in the following interaction.
Person: Do you think I can use this computer tomorrow?
ELIZA: Why do you ask?
Person: I should make more on-line sales.
ELIZA: Why are you not sure?
Person: My potential clients have never seen a demonstration of this system.
ELIZA: That does mean much for you?
ELIZA: You seem very enthusiastic about it
Person: This is ridiculous
ELIZA: What do you mean this is ridiculous?
Person: Is there any reason for which I can not use the computer now?
ELIZA: Tell me more about your potential clients.
In this case, the person speaking to ELIZA was told before the interaction that he was speaking to another human, and after the conversation, he didn’t suspect he had been instead talking to a machine at all.
ELIZA worked using a dictionary or keyword approach, looking for specific words within the input sentence that if found would trigger a pre-made response, which was probably a question. If none of the keywords was found, a generic response was given. Nowadays, much more sophisticated systems exist.
The Loebner Prize
The Loebner Prize is a competition hosted every year since 1990. It has had many different hosting locations like MIT, Cambridge University or the Science Museum of London.
Its goal is to evaluate the state of the art of conversational machines aspiring to pass the Turing Test and to promote Artificial Intelligence and Natural Language Processing research.
The procedure for this competition is the same that would be used for a normal Turing test: 30 different judges sit with 2 screens each and have two separate conversations: one with a computer program and one with a real person.
The goal is that of being able to correctly asses which screen belongs to the machine and which belongs to the person. Judges have various interactions with machine/human counterparts, and at the end of the day, the artificial system that has been able to fool the judges with the highest percentage success is crowned victorious.
An improved version of ELIZA, known as the PC Therapist, won this competition on its first three editions.
Despite being able to somehow asses the Turing Test performance of State of the Art Artificial Intelligence systems, this competition has created some controversy regarding the role of the judges, the way these can be tricked by approaches like ELIZA’s or other sorts of mimics, and the short and sometimes whimsical conversations that take place between judges and machines.
One of the biggest criticisms of this trial is the following by Hayes and Ford in 1995:
“the imitation game conditions say nothing about the judge, but the success of the game depends crucially on how clever, knowledgeable and insightful the judge is”
Also, very few times have the competing systems come close to beating the 1/3 of fooled judges threshold that is acknowledged as the limit to accept that it has passed the Turing Test. We will finish the article by exploring one of the systems that has touched this threshold: Eugene.
Eugene Goostman is a conversational chat-bot designed in 2001 that came second in the Loebner Price in 2005 and 2008. In 2012, in a distinct competition, it obtained the highest ever score of 29% of the judges being fooled, and coming very close to surpassing the Turing Test Threshold.
Eugene works by imitating a 13 year-old Ukrainian boy, with grammatical mistakes, the lack of knowledge that can only be attributed to a kid, and a very special form of speaking.
Again, we can see the trickery in these kinds of systems. They avoid facing a natural conversation and instead try to exploit the weaknesses of the test by imitating a very specific character, thus driving the conversation to narrow domains in which they can comfortably have a chat.
In 2014, with this strategy, Eugene passed the Turing test, fooling 33% of the judges it spoke to in an event organised to honor the 60th anniversary of Turing’s death. Let’s see a transcript of a conversation with Eugene.
Judge: what music do you like?
Eugene: I like Eminem — because he doesn’t have a thing to do with “polite-correctness”.
Judge: OMG i love eminem, he’s my favourite artist!
Eugene: I wonder what Eminem’s nick mean. I associate it only with a word “enema”. But he is a cool guy, I like his “The Real Slim Shady”.
Judge: i know what his nick name means, his real name is Marshall Mathers. From the initials M&M phynetically sounding like Eminem.
Eugene: Eminem is a famous crapper. No, raper… ripper… Yeah, I mean — rapper. I like his “Stan” and “The Real Slim Shady”.
Judge: I like “Hailies song”
Eugene: Interesting tastes. I like science fiction books, by the way. Where do you came from, by the way? Could you tell me about the place where you live?
Judge: Epsom, home of the derby. yourself?
Eugene: My guinea pig says that name Derby sounds very nice.
Judge: i shopuld talk with him sometime
Eugene: I think you should take some typing lessons. Or just type more accurately.
Judge: that’s very rude?
Eugene: Ummm… Frankly, I didn’t get your question:-(
Judge: “Ummm” isnt a word. i think you need typing lessons
Eugene: My thoughts are same. By the way, I still don’t know your specialty — or, possibly, I’ve missed it?
This article has explored the origin, nature, and evolution of the Turing test, one of the first arrows directed towards assessing the target of Artificial Intelligence.
I hope it has not only served you for the purpose of understanding the origins of AI, but that it has also raised several questions, leading to some sort of internal discussion.
If not, I will give you one last nudge, inciting you to answer the following question:
Can this test really be a proxy for answering the question of whether machines can think?
There are various other factors which should be considered when trying to answer this question, aside from the previously provided.
First, is it important to know whether one of the two entities which we are interacting with is a machine?
As we saw in the transcript of the conversation with Eliza, the human did not previously know that this was a chance, so for him it was unquestionably an interaction with a person.
Would this have changed if he knew that he could have either been speaking to a machine or to a human?
Secondly, should the time limitations or unrestricted character of the conversations be addressed in some way? Also, what should we do about the previously discussed problem with the judges?
All of these considerations can only further stress the last sentence written by A. Turing in Computing machinery and Intelligence:
We can only see a short distance ahead, but we can see plenty there needs to be done.