Did a Computer Program Pass the Turing Test?

If the majority of media reports are to be believed, a computer program has finally and officially passed the famous Turing Test by fooling enough people into thinking it is human. When I first heard about this from otherwise trustworthy and reputable news sources, I was very excited. Upon looking into the story more closely, however, I was disappointed to find that the claim is not as accurate as I wanted it to be.

Allow me to provide some background on the origins of the test before getting into the current news.

What is the Turing Test?

The Turing Test was devised by Alan Mathison Turing in 1950. Turing was a legendary genius in the field of computer science and mathematics. In addition to pioneering artificial intelligence studies, he developed the fundamental formal model of computation and algorithms with the Turing machine, a hypothetical device still used in theoretical computer science today. During World War II, Turing helped crack the Nazi code cypher. In the final years of his short life, Turing was heavily persecuted for being a homosexual, a criminal offense at that time in the UK. After suffering a great deal of this injustice, he died in 1954 at the age of 41 from cyanide poisoning. Although an official inquest determined his death to be a suicide, there is some controversy about whether he actually intended to take his own life.

In 1950, Turing wrote a seminal paper titled “Computing Machinery and Intelligence,” which was published in the peer-reviewed journal Mind. His goal in writing the paper was to tackle the question, “Can machines think?” While this question is a very commonplace subject of discussion in mainstream culture today, it was bold new territory to explore 64 years ago. Turing opens the paper with a discussion on the importance of carefully and unambiguously defining the words “machine” and “think”:

The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous, If the meaning of the words “machine” and “think” are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, “Can machines think?” is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it [. . .]

Turing then proceeds to express the alternatively-framed question in the context of a proposed experiment he called the “imitation game”:

It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either “X is A and Y is B” or “X is B and Y is A.” The interrogator is allowed to put questions to A and B [. . .]

In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprinter communicating between the two rooms. Alternatively the question and answers can be repeated by an intermediary. The object of the game for the third player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as “I am the woman, don’t listen to him!” to her answers, but it will avail nothing as the man can make similar remarks.

We now ask the question, “What will happen when a machine takes the part of A in this game?” Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, “Can machines think?”

This initial formulation of the “imitation game” eventually evolved into the formal Turing Test. This in turn has evolved into the modern conception of the test (the so-called “standard interpretation”), wherein human subjects interact with machines as well as with other humans, an exchange mediated by technicians who ensure the subjects are separated from each other. According to most modern versions of the test, a machine passes the Turing Test if it convinces 30 percent of the human judges that it is human.

Turing Test 2014

Turing tests for artificial intelligence have been conducted many times over the last 50 years, but no software program has yet succeeded in passing the test. Researchers have learned some good lessons from these past failures, and the design and protocol of the latest test to be conducted looked very promising.

The test was organized by Kevin Warwick of the University of Reading as part of a Turing Test competition held at the Royal Society in London on June 7, 2014, the 60th anniversary of Alan Turing’s death. Two features of this test distinguished it from others done in the past. First, this event used more simultaneous comparison tests than ever before. Second, and most importantly, the 30 judges who participated were allowed to ask the computer program any question they wanted, without restriction. In the past, the majority of Turing tests placed strict limitations on what human judges could say to the computer program, and they were restricted to a finite set of questions that could be asked. This is one of the few aspects of the latest test that is praiseworthy. As Warwick correctly remarked, “A true Turing Test does not set the questions or topics prior to the conversations.” The most one could say about any computer program that passed a test with such restrictions is that it is an “expert system,” and not an A.I.

The subject of Warwick’s test was a computer program named Eugene Goostman. Taking on the persona of a 13 year-old Ukrainian boy, Eugene convinced 33 percent of the judges at the University of Reading event that he was a real boy. This was not the first time Eugene competed in a Turing Test contest. In June 2012, Eugene was the winner of a Turing Test competition held at Bletchley Park, near Milton Keynes in England. In this competition, Eugene fell just short of passing the modern conception of Turing’s test when he convinced 29 percent of the 30 judges that it was a human.

But did the Eugene Goostman program really pass the Turing Test on June 7, 2014? The answer largely depends on definitions. One could reasonably argue that Eugene passed the modern “standard interpretation” version of the test, considering the unprecedented number of simultaneous comparison tests used, as well as the fact that the range of questions open to the judges was unlimited.

But the latest test has come under strong criticism for very good reasons, making the unfortunate conclusion that Eugene failed to pass the Turing Test a hard one to bypass. There are three main objections to consider:

1. Designing a program to take on the persona of a 13 year-old boy who is not a native speaker of the judges’ language is a very underhanded way to go about attempting a Turing Test win. The use of language is one of the few pieces of data we can call to our aid in attempting to tell the difference between a human and a machine. When a computer program informs the judges right from the start that not only is he a young and possibly naïve teenager, but that English is not even his first language, this confers an unfair advantage on the program. In the words of Mike Masnick in his highly-critical TechDirt article on the event, it was a suspiciously convenient way to “mentally explain away odd responses.”

2. The 30 percent figure commonly used as the standard for determining whether a program has passed the Turing Test by fooling that number of human judges is ridiculously low. Furthermore, it is a gross misunderstanding and misinterpretation of what Turing said. Turing did not intend 30 percent to be used as a measure of how many human judges need to be fooled by the machine. This 30 percent figure was not a part of his test protocol, but rather comes up in the context of the following prediction Turing made in his 1950 paper:

I believe that in about fifty years’ time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. The original question, “Can machines think?” I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.

Aside from the misinterpretation of what Turing’s 30 percent signified, the number is much too low to be used as a reliable indicator of artificial intelligence. Two years after this paper was published, Turing revised his original “fifty years” prediction, saying that he did not anticipate a machine passing his test for at least 100 years. Turing made this more cautious prediction in a 1952 conversation with British mathematician and codebreaker Maxwell Newman on a BBC radio broadcast entitled “Can Automatic Calculating Machines Be Said to Think?”

Newman: I should like to be there when your match between a man and a machine takes place, and perhaps to try my hand at making up some of the questions. But that will be a long time from now, if the machine is to stand any chance with no questions barred?

Turing: Oh yes, at least 100 years, I should say.

Satisfying the demands of an unrestricted version of Turing’s test means more than just allowing human interrogators an unlimited range of questions and talking points, which is an important step in the right direction. Test designers should also make the criterion for passing more rigorous than the currently conventional 30 percent used as the number of judges who need to be convinced by the computer that it is human. A much better standard would place the winning threshold at 60 or 70 percent.

3. Eugene is nothing more than a chatterbot, not a sophisticated software program. As such it cannot be said to really understand the questions asked of it or to comprehend what the conversation is about. Eugene does not possess the capability of any significant level of thinking, not even an animal level of consciousness. Eugene is designed to steer conversations in the direction of interactions that are scripted and pre-programmed. Even a cursory reading of the exchanges typical of Eugene betrays a level of question analysis and parsing ability that is very minimal. In short, Eugene Goostman is not a technological breakthrough of the kind one would expect from a program capable of passing a Turing Test. The program is instead little more than a sideshow.

Is the Turing Test a Joke?

The Turing Test has been tweaked and revised many times over the past six decades. But the fact that a successful test has yet to be performed indicates that the standard interpretation paradigm of Turing’s test needs to be discarded or shifted in a new direction, one that reflects our more subtle understanding of mind and intelligence and their intersections with computer science. Many prominent artificial intelligence researchers, including cognitive scientist Marvin Minsky, have concluded for good reasons that the modern application of the Turing Test is a joke. Minsky made the following statement in a recent interview with Nikola Danaylov on the Singularity 1 on 1 podcast:

The Turing Test is a joke, sort of, about saying a machine would be intelligent if it does things that an observer would say must be being done by a human. It was suggested by Alan Turing as one way to evaluate a machine, but he had never intended it as being the way to decide whether a machine was really intelligent. So it’s not a serious question.

The main reason many A.I. researchers do not take the modern application of the Turing Test seriously is because it is a misinterpretation of Turing’s original thesis. The Turing Test cannot determine whether a machine is really abstracting or understanding. The test can only judge based on the machine’s output. The test was designed only to determine whether a machine’s output can be made indistinguishable from that of a human, nothing more. We cannot infer from the quality of the output whether a machine has gained any level of actual abstract thought and intelligence.

This is not to say that the Turing Test is necessarily useless as an indicator of A.I. It is important to distinguish between types of intelligence, of which there are many. Self-awareness and conscious thought is only one type, the one most familiar to us as humans who possess it. To assume that A.I. must conform to our human-level intelligence is a category mistake borne of hubris. The types of intelligence that are not human-like should not be dismissed as unimportant or somehow less valuable, and I contend that Turing Test enthusiasts can make progress in their efforts by moving beyond the anthropocentric paradigm that is not likely to lead anywhere.

This is not to say that machines may not one day acquire self-awareness and sentience analogous to human intelligence. I think someday they will. But this revolutionary development will not be indicated or confirmed by the Turing Test as popularly conceived.


Leave a Reply

Your email address will not be published. Required fields are marked *