[8.0] The Turing Test

v2.5.0 / chapter 8 of 15 / 01 jul 25 / greg goebel

* At the dawn of the computer age, British mathematician Alan Turing wondered if machines could ever think. He suggested a test to show that they did: if a machine could hold a convincing conversation, then it could be said to think. The "Turing test" has, since that time, led to a great deal of work on machines that can hold conversations, as well as a great of argument -- mostly due to misunderstandings of what Turing was trying to say.

[8.1] MACHINES WHO THINK?
[8.2] CONVERSATIONAL SYSTEMS
[8.3] HOW CONVERSATIONAL SYSTEMS WORK
[8.4] THE TURING RULE

[8.1] MACHINES WHO THINK?

* Scholars had been discussing the human mind for centuries on the basis that it was unique. It was only until the last half of the 20th century that programmable computers were developed, and changed the ground rules of the discussion -- by suggesting the possibility that the human mind could be understood, and even duplicated by a machine. Such a proposal was far beyond anything that Descartes, limited to pipe organs as his vision of high technology, could have realistically contemplated. When the possibility of a machine mind became a serious question, it provoked a good deal of consternation and indignation.

In 1950, Alan Turing published a far-sighted essay titled "Computing Machinery & Intelligence", in which he posed the question: "Can machines think?" -- or put another way, can a machine have a mind? At the time, machines certainly didn't have minds, but Turing was looking forward to the future, and simultaneously trying to sort out the question. Turing devised his own definition of "thinking", defining what he called the "imitation game", or what is now called the "Turing test".

It's a deceptively simple idea. Suppose Alice is conversing with Bob over the internet -- Turing, lacking 21st-century technology, envisioned a teletype instead, it makes no real difference. Turing suggested that if Alice couldn't tell if she was communicating with Bob or with a machine, then the machine could be regarded as a thinking machine.

The Turing test challenged Descartes' failure to envision a machine that could "use words or other signs by composing them". Turing, considering machines far beyond anything Descartes could have imagined, posed the question: "Why can't machines hold a conversation?" Turing suggested that, if a machine could hold a conversation for five minutes and fool 30% of judges into thinking it was human, then it passed the test.

It is often said that "the Turing test proves nothing". In reality, it doesn't try to prove anything; Turing simply put forward his own definition of "thinking", and asked the question of if a machine might be built that could, as per that definition, think. Turing proposed his test as merely one possible definition, asserting that a convincing "conversational software system" could and would be judged by people as demonstrating a degree of thought -- not to be mistaken for a mindless calculator, no matter how sophisticated.

BACK_TO_TOP

[8.2] CONVERSATIONAL SYSTEMS

* Turing's question has led to a great deal of work on computing systems that can hold human-like conversations. It is actually not hard to build a crude conversational software system -- the archetype being ELIZA, written in 1966 by pioneering computer scientist Joseph Weizenbaum (1923:2008) of the Massachusetts Institute of Technology (MIT).

ELIZA pretends to be a psychotherapist. It's dim-witted, it simply scans through input text for keywords like "mother" or "depressed", and gives kitted answers, usually in the form of neutral leading questions: "Tell me more about that." -- or: "What do you feel about that?" For example:

QUOTE:

E: Hello, let's talk.

What do you want to talk about?

E: We were discussing you, not me.

Is there something specific you want to ask me?

E: Oh ... want to ask you?

That's right.

E: Tell me more...

About what?

E: Have you asked such questions before?

Yes, see above.

E: Are you sure?

And that, Your Honor, is when I shot him.

END_QUOTE

As this dialogue shows, ELIZA quickly reveals that it is a dimwit, making it clear that nobody's home. However, ELIZA was seen as remarkable in its time; nobody had ever built a machine that could fake a conversation that well. Many users were deeply impressed by ELIZA, thinking it a remarkable accomplishment -- though Weizenbaum thought it was taken too seriously, and soured on AI research. The inclination to be over-impressed by conversational systems, on the basis they could do things few expected that a machine could ever do, became known as the "ELIZA effect".

ELIZA was the first of a series of "chatterbots" or "chatbots" of increasing sophistication. PARRY, introduced in 1972, was put together by an American psychiatrist and computer geek named Kenneth Colby (1920:2001) to simulate a paranoid schizophrenic -- not really as a prank, more as a way to assess the broken reasoning by which such people operate. To no surprise, PARRY was connected to ELIZA, with conversations of the form:

QUOTE:

P: In my opinion, you have a one-track mind.

E: What makes you think I have a one-track mind?

P: I used to go to the track every week.

E: What does that suggest to you?

P: I avoid racetracks now because the races are fixed.

E: Is that the real reason?

P: I'd rather not discuss it.

E: What else would you like to discuss?

P: That's something I don't want to go into.

END_QUOTE

PARRY was "moody", able to change its attitude depending on "mood parameters" for anger, fear, or distrust. PARRY wasn't that much of a jump beyond ELIZA, It was followed from the early 1980s by "Jabberwacky", devised by a British programmer named Rollo Carpenter (born 1965). As originally devised, Jabberwacky was still no great advance over ELIZA, as the following conversation shows:

QUOTE:

JW: Will you say anything I wonder?

Hello JW -- tell me about yourself, OK?

JW: I don't want to. Can we play a game?

What game do you want to play?

JW: I don't mind. You choose a game.

What games can you play?

JW: What games can you play?

END_QUOTE

Jabberwacky was seen in part as a framework for tinkering with conversational systems and evolved over time, to eventually be known as "Cleverbot" -- though by modern standards, it wasn't all that clever.

The next prominent step in chatbots beyond Jabberwacky was the "Artificial Linguistic Internet Computer Entity (ALICE)", developed in 1995 by AI researcher Richard Wallace (born 1960) of Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania -- a longtime hub of AI research. It was mainly intended to demonstrate an "Artificial Intelligence Markup Language (AIML)" as a tool for AI research, also going through a series of evolved versions, and ported to different computer languages.

By that time, the "Loebner Prize" had been established by American inventor Hugh Loebner (1942:2016), which offered cash awards to software that was came closest to passing the Turing Test. It was basically a promotional stunt, Loebner establishing it to spread awareness of the Turing Test and AI systems, and did not long outlive Loebner. ALICE won the prize three times.

ALICE was followed by "SmarterChild", introduced in 2001 by the ActiveBuddy company. Significantly, the Turing Test wasn't a real factor in its design; it was instead being the first chatbot intended for practical use, designed to provide a convenient interface to access resources provided by the America Online, Microsoft, and Yahoo messaging networks. It could hold conversations on a simple level, and had a definite personality -- for example, it didn't care for foul language: "I don't like the way you're speaking now. These are awful words to use." It was one of the first chatbots to obtain wide usage, for the simple reason that it was useful.

SmarterChild led to further refined conversational systems. In 2014, one such based on a character named "Eugene Goostman" -- given as "EGM" for convenience here, and created as a side project by a gang of programmers at a startup named PrincetonAI -- impressed people with its lively conversations. Journalist Lance Ulanoff of the MASHABLE website reported on his conversation with EGM, the following transcript being edited down:

QUOTE:

LUF: Hi Eugene, how are you?

EGM: Hello! I am fine, thanks, and you?

LUF: I'm well. Congratulations on fooling those judges.

EGM: Nice to hear that! Do you like the weather?

LUF: Yes, it's pretty nice, though a bit humid.

EGM: At least someone agrees with me. I'll remember this nice moment for ages! Oooh. Anything else?

LUF: Do you think?

EGM: No, not really. I don't think. Where do you come from, by the way? Could you tell me about the place where you live?

LUF: I live on Long Island, though I was born in New York City.

EGM: I saw New York in many movies. I think it exists to bewilder people's imagination all -- over the world.

LUF: What do you want to be when you grow up?

EGM: Maybe I'll be a gynecologist, like my father -- so at least I'll finally understand where children come from ...

MEF: Where did you come from?

EGM: I live in a big Ukrainian city called Odesa. It is on the Black Sea shore.

LUF: What do you think of computers?

EGM: If we are talking about machinery, do you like computer games, by the way?

END_QUOTE

And so on. EGM was far smarter than ELIZA, but was still superficial. EGM was deliberately handicapped; his creators cast him as a quirky 13-year-old from Odessa -- in other words, English wasn't his first language, meaning misunderstandings could be expected, and as a 13-year-old his world view and knowledge were limited.

EGM's world view was preprogrammed; anything outside the bounds of his programming simply didn't make sense to him. When he ran up against the bounds, he would plead ignorance or change the subject. He could remember names and places, but it wouldn't be possible to teach him anything substantial.

* However, by the time EGM was introduced, conversational systems were finally becoming established in the real world, and suggesting that possible EGM was overkill. The breakthrough, descending from SmarterChild, was Apple's "Siri", introduced in 2010, which was a "personal assistant" AKA "virtual assistant" for on the Apple iPhone, used to make queries about weather or other important information, and perform simple tasks like making phone calls. It was followed over the next few years by other personal assistants, including Amazon's "Alexa", Microsoft's "Cortana", and Google "Assistant", running on smartphones or home automation controllers.

Again, they were not intended to pass the Turing Test, instead being used -- primarily by voice -- to answer queries and perform simple tasks. They amounted to voice command interfaces to a computer. They had limited command sets, but were flexible in how users gave them commands. Like SmarterChild, they could handle simple conversations. Other software, such as search engines, also became more adept at understanding loosely-defined user inputs. In Turing's day, there were no conversational systems, but today they are commonplace.

BACK_TO_TOP

[8.3] HOW CONVERSATIONAL SYSTEMS WORK

* Early in the era of modern computing, computer communications were typically based on a keyboard and a display system of some sort -- after a time, with a mouse or other pointing device thrown in. Communications with a machine were based on sets of generally cryptic commands, and eventually menu systems. The introduction of chatbots like SmarterChild opened the door to use of ordinary language to communicate with computers.

Early on, machine comprehension of language input was limited: computers had small vocabularies, and users had to simple and straightforward in their queries. Over time, they gradually became more adept at "natural language processing", and more flexible in understanding what they were told. Originally, they were built as "rule-based" systems -- scanning user input for specific keywords or phrases, not necessarily with much consideration of syntax, and generating response according to a set of rules.

Consider, for example, a chatbot used to screen customers who contact an online support organization, qualifying their inquiries and offering suggestions for simple troubleshooting. Simple screening systems are based on "decision trees", which are really just interactive flowcharts -- providing kitted text and typically asking users to select answers from a list. If they do permit free-form input, they're invariably highly limited in their comprehension of the inputs, expecting a simple and straightforward answer, and replying "not understood" if they don't get it.

In any case, once a chatbot understands what it's been asked, it's not that difficult for it to reply; it either has an answer on file, or can search for one elsewhere on the internet. Delivering such answers is not a hard problem. There's a principle in communications between electronic instruments expressed as: "Forgiving Listening, Precise Talking":

A machine should be unfussy and forgiving in understanding, say, the numeric formats of the data it is receiving ...
But it should be precise in sending data in a standard format that leaves as little ambiguity as possible.

Given the rules of language syntax and a well-defined vocabulary, a machine can easily express itself, though it can sound stiff and stilted. Language construction is a much easier issue than language understanding.

Modern chatbots can work with either text or voice. Voice leaves a lot to be desired, since machines can have problems reliably making out what is said, and it doesn't work well in elaborate conversations. However, it's easier to communicate with a smartphone using voice, since they don't normally have a keyboard.

Finally, there is the question of machine personality, which is basically asking how a chatbot responds to queries -- criteria being clarity, tact, and general "user friendliness", structured around a designed and consistent personality. While chatbots like SmarterChild could get a bit facetious, in general chatbots have become smooth, polite, and bland, there being no good reason to take any chances. Voice-based chatbots can provide some programmability, such as use of male or female voices, and different accents.

That covers the basics of how chatbots work -- or at least worked up to about 2020. At that time, rule-based systems predominated. The problem with such a "Good Old-Fashioned AI (GOFAI)" approach was that it was laborious and inflexible: everything known to the system had to be spelled out, and plugged into the set of rules.

In parallel to the development of personal assistants, work on AI systems based on "deep learning (DL)", being trained with floods of data to obtain broad capabilities, was advancing, leading to the introduction of the first modern AI chatbot, "ChatGPT" from the OpenAI company, in 2021. It was followed by other AI chatbots, including Google Gemini and Microsoft Copilot. They were much more capable than the earlier personal assistants, the AI chatbots having a broad understanding of language and the ability to answer any question posed to as much detail as desired. Whether they got it right was another question.

BACK_TO_TOP

[8.4] THE TURING RULE

* Let's suppose we have a highly capable chatbot named, say, "Smartbot". Smartbot, like comparable chatbots, was designed to act like a person of sorts, something like a cartoon character, but it wasn't built to trick users into thinking there was a human there. However, if users dealt with Smartbot as if it thought, that had a mind, wouldn't that mean it passed the Turing test?

As the Turing test is phrased, no, since nobody would believe Smartbot was human. However, the Turing is no more or less than an attempt to come to grips with the question: "Can a machine think?" The question was more Turing's focal point than his answer. That answer: "Yes, but only if it can't be told from a human." -- suffers from a number of difficulties on inspection.

For one thing, no human is entirely adept at language understanding or generation. Indeed, there are humans who can't pass the Turing test: everybody is inarticulate sometimes, while some people are inarticulate, even incoherent, all the time. There are also obsessive crackpots who inflexibly say the same things over and over, being easily emulated with a badly-written chatterbot. If there is a Turing test, there is necessarily also a "reverse Turing test" -- in which the game is not to tell a machine from a human, but a human from a machine, even a badly-built machine.

Along the same lines, perfectly sensible people who don't speak English as a first language may not be able to understand or reproduce the nuances in a conversation between two English-speakers, even if their English is very good. As noted above, that was the dodge in making EGM a Ukrainian boy. Conversing with non-English speakers can be done, but it requires a more careful and literal approach to communications.

Turing himself pointed out weaknesses in the Turing test, noting that computers do some things, particularly computations, far better than humans. Being able to rattle off lightning calculations would be a big giveaway that one was dealing with a computer. As Turing put it:

QUOTE:

The game may perhaps be criticised on the ground that the odds are weighted too heavily against the machine. If the man were to try and pretend to be the machine, he would clearly make a very poor showing. He would be given away at once by slowness and inaccuracy in arithmetic. May not machines carry out something which ought to be described as thinking but which is very different from what a man does?

END_QUOTE

The question in response is: why is this a problem? Yes, it gives away the machine, but if it could carry out a sensible conversation, it's not like we'd refuse to admit it was a thinking machine after all. Turing didn't think it was a problem himself: "We need not be troubled by this objection."

Turing's comment was revealing, in that it showed he did not expect his test to be considered as anything more than a suggestion. In response to the question of: "Can a machine think?" -- what his actual answer was: "Yes, it could think if we couldn't tell it from a human, though once we acquire experience, we may not demand so much."

Turing raised the bar unrealistically high, choosing his test because it was, in principle, bullet-proof; there was no more way to prove such a machine couldn't "really think" than there was to prove a human couldn't "really think". All he was saying was that, since we judge humans as thinking beings, a machine that was indistinguishable in its behavior from a human would have to be considered a thinking being as well. If nobody could tell the difference, then what would it be missing? Would it be a zombie? How could anyone ever know?

At the time, Turing didn't have any conversational machine systems to inspect, and the practical realization of the test wasn't in the cards he had. Now conversational machine systems are in widespread use, and we give them a spin around the block. If we can talk with chatbots on a conversational basis and they do what we want them to do, then on what basis could we think they didn't have minds of some level of sophistication? Of course we think they do.

The Turing test can be more generally and usefully rephrased as: "If a machine can reasonably convince us it thinks, and nobody can provide any material reason to believe it doesn't, then it is a thinking machine." There's not much need and not much sense in building a machine that can trick us into thinking it's a human. We want to build machines that are humanlike, at least in a cartoonish way, but only to make them more capable servants.

* The emergence of a form of deep-learning AI known as "generative AI (GAI)" in the 2020s led to a surge of commentary about the Turing test. GAI started out in 2014, when an American computer scientist named Ian Goodfellow (born 1987) came up with the bright idea of the "generative adversarial network (GAN)".

A GAN consists of two primary elements: a "generative network" that synthesizes candidates for inclusion in an AI's background dataset, along with a "discriminative network" that evaluates the candidates to see if they are within the bounds already established by the original training dataset, and candidates that have been accepted. The discriminator feeds the PASS or FAIL decision back to the generator to allow it to refine its synthesis of candidates.

In some applications, a GAN can "boostrap" an AI with a limited training set by generating an additional synthetic training set. More significantly, a GAN is capable of generating original content. One of the early applications was synthesis of images, particularly of people, with GANs built that would generate endless images of people who didn't exist. Of course, they could also generate images of people who did exist, the result being "deepfake" videos of well-known persons that were complete fabrications. Similar tricks could be done with music, allowing users to synthesize any kind of music, in any style, they wanted.

More relevantly here, after figuring out how to properly encode textual data, GANs became a core element of chatbots, with capabilities that had barely been imagined before. A GAI chatbot trained on the floods of information available online could, given a query, write a tailored answer to the question, and expand it to article length if asked. Unlike a traditional search engine, which listed sources with the requested content, it actually returned the desired content.

Working from there, it could write a cover letter, produce advertisements and business releases, generate working program code -- and, taken to the level of "agentic AI", generate plans and execute them. Applications were open-ended, with applications growing as GAI became more capable. It was possible to imagine a GAI system being given a specification for a video program, to then generate the entire program, including original theme music. As discussed later, however, GAI systems would prove to have significant drawbacks.

* In any case, again the introduction of GAI led to considerable discussion of the Turing test: does a GAI chatbot really think?

That led to the question in turn: "Does it give thoughtful answers?" -- with the answer: "Yes it does, so it thinks." Everyone knows it's a machine, but it gives answers that, as a rule, are as articulate and well-reasoned as we might expect from a human. Yes, sometimes it goes wrong, but humans don't always give articulate or correct answers either. There was great enthusiasm to the introduction of GAI chatbots, but it had nothing to do with how well they faked being a human; users instead recognized how useful they could be.

It should be noted that anyone using a chatbot recognizes it is clearly different from humans in that it has no will of its own. It answers a question to the best of its ability and then goes idle, doing nothing while waiting for another question. It may have memories of previous questions and answers, but otherwise it doesn't have much of a stream of consciousness. Otto the autopilot, in contrast, does have a stream of consciousness -- a very narrow one, strictly focused on conducting his mission, operating in real time.

Anyway, Turing's real bottom line was to say that, yes, machines could think. Given any credible test for a thinking machine that might be proposed, there was no reason to believe that, sooner or later, a machine would not be able to pass that test. In his 1950 essay, Turing threw out a grab-bag of human capabilities that, supposedly, machines would never be able to do:

QUOTE:

Be kind, resourceful, beautiful, friendly, have initiative, have a sense of humour, tell right from wrong, make mistakes, fall in love, enjoy strawberries and cream, make some one fall in love with it, learn from experience, use words properly, be the subject of its own thought, have as much diversity of behaviour as a man, do something really new.

END_QUOTE

Turing could have easily dismissed these claims, since they were basically silly, or at least trivial -- but having raised the issue, he had to address it. Fall in love with a machine? People do it all the time, some becoming passionately devoted to their cars; children becoming deeply attached to their inanimate stuffed toys. Intelligent machines can take that a step further: when told: "I love you." -- they can answer: "I love you, too."

Anyway, as Turing pointed out, no justification was ever given for such claims. Who was to say in 1950 that machines would never be able to learn from experience, or come up with something new? He suggested that people expressed incredulity about suggestions of the things a machine might be able to do, for no other reason than they had never seen machines do them -- and simply concluded they couldn't.

As Turing realized, that was not so unreasonable given the limits of computing power that could be conceived in that era -- a cheap modern smartphone has far more memory and processing power than the mainframe computers of fifty years ago -- but it's hardly so unreasonable now. Even in his time, Turing clearly saw the silliness of the debate over the machine mind:

QUOTE:

The original question: "Can machines think?" -- I believe to be too meaningless to deserve discussion. Nevertheless, I believe that at the end of the century, the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.

END_QUOTE

Turing was wrong on that, it's still argued, with the Turing test the target of bitter criticism. Since all the Turing test amounts to is: "If we think a machine thinks, then it thinks." -- that seems like a waste of ammunition. Indeed, it almost seems like Turing was merely playing a prank in coming with up the Turing test; but not true, he was attempting to address a question that people were going to ask, were asking even in 1950. At the core of his response was what might be called the "Turing rule": it is impossible to identify any cognitive process that a human can perform that a machine can't. He never actually used the term, but the rule is implicit in his definition of a universal machine.

Humans cannot figure out everything a computer can do. An ordinary modern PC could run a program a gigabyte or more in size; each byte of memory can contain 2^8 = 256 possible values, so the number of possible binary code arrangements is about (2^8)^1,000,000,000, or about (10^2.408)^1,000,000,000 == 10E2,408,000,000. Specifics of computer command sets means that the actual number of possible programs is less than that, but we can lowball the value to 10E1,000,000,000 -- one in trillions -- to cover all the bases and simplify the calculation. Given double-sided pages, with 100 zeroes per line and 100 lines from top to bottom of the pages, it would take 50,000 pages to print that number out. Binding 500 pages into a volume would give a set of 100 books, all full of zeroes.

Of course, the number of programs that actually do anything interesting is a vanishingly small subset of those programs; but we can easily claim that, say, 10E100, one line in a page, could be assumed to be workable programs. The ratio between that and even two lines in a page would also be 10E100, which is vastly smaller than the ratio of mass between a single hydrogen atom and the entire Universe, which is "only" about 10E80.

10E100 is more likely a gross underestimate of the number of working programs than an overestimate -- but at 10E100, it's in practical terms unbounded. There is no way humans will ever write an appreciable fraction of those possible programs. That isn't factoring in multiple interacting programs, either those executing in sequence on a single computer, or those executing in a network on multiple computers.

If it is claimed that humans have cognitive capabilities that cannot be performed by a machine, we can ask for a specification of these actions; but computers being universal machines, if we can specify those actions, we can then implement them on a computer. If human cognition is simply due to PONs, and a digital computer can emulate neural nets -- using a random-number generator to feed noise into the operation, if that is what's needed -- then there is absolutely no demonstrable reason a computer can't do what a human brain can do.

To assert that there's more to the mind than the operation of PONs is to invoke some sort of Harvey, without being able to say anything specific about him, which is merely confused thinking. As Bob might say: "There are a lot of things going on that we don't know anything about, Alice!"

To which Alice would reply: "OK Bob, tell me about one thing you don't know anything about. You can't, can you?" End of story.

BACK_TO_TOP