“Can machines think?” That’s the core question legendary mathematician and computer scientist Alan Turing posed in October, 1950. Turing wanted to assess whether machines could imitate or exhibit human-level intelligent behavior, and so he came up with a test called the “imitation game.” This later became known as the Turing test, which is commonly used to assess how well a machine can mimic human behavior.

The genesis of Turing’s test came from the inherent difficulty in establishing objective criteria that distinguishes original thought from the imitation of it. The challenge is that evidence of original thought could be denied with the argument that a machine was simply programmed to seem intelligent. Essentially, the crux of proving if machines can think is defining what thinking is.

Related: 8 of the weirdest robots in the world right now

Turing wanted to challenge the idea that the mechanical nature of computers means they cannot, in principle, think. The mathematician was positing that, if a computer appears indistinguishable from a human, then why should it not be considered a thinking entity?

How does the Turing test work?

Turing proposed a three-party game. He first outlined a test in which a man and woman go into separate rooms and party guests use typewritten answers to try and determine which person is which, while the man and woman try to convince them that they are the opposite sex.

From there, Turing proposed a test whereby a remote interrogator is tasked with asking questions to a computer and human subject, both unseen, for five minutes in order to determine which is sentient. A computer’s success at “thinking” could then be measured by how likely it is to be misidentified as a human.

A later iteration of the imitation game, proposed by Turing in 1952 in a BBC broadcast, would see a computer try and convince a jury of people that it was human.

The Turing test was created as more of a philosophical thought experiment than a practical means of defining machine intelligence. However, it grew to be seen as an ultimate target for machine learning and artificial intelligence (AI) systems to pass in order to demonstrate artificial general intelligence.

Turing predicted that by the early 2000s, a programmed computer would be able to “play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.”

But, that did not come to pass. However, the rise of ChatGPT and other artificial intelligence systems and large language models (LLM) has reignited the conversation around the Turing test.

In June 2024, researchers claimed that the LLM GPT-4 was judged to be human 54% of the time in the Turing test within five minutes of questioning. That resoundingly beats Turing’s prediction of 30%, despite being two decades on from the mathematician’s predicted date. But this research from the University of San Diego only involved two players in the test rather than Turing’s original three-player game, so GPT-4 didn’t pass the Turing test in the specific conditions he defined.

Nevertheless, this research still shows how such AIs can at least imitate humans with some success.

Challenges and limitations of the Turing test

While passing the Turing test might be the big goal to prove thinking in AI systems, the test has its limitations and opponents.

Turing himself detailed and addressed nine objections to his test and theory in proving machines could think; these range from the theological concept of thought and the idea that machines can’t feel emotions, or have a sense of humor, to logical mathematical limitation that will simply prevent a machine from answering a question or getting it correct.

(Image credit: Jesussanz/Getty Images)

But perhaps the most relevant objection comes from mathematician Ada Lovelace, who when commenting on computing pioneer Charles Babbage’s Analytical Engine, suggested that a machine cannot “originate anything” and can only do whatever we order it to perform. Turing’s retort in his paper was to ask whether humans can indeed ever do anything really new in a deterministic world bound by the laws of nature and the boundaries of the universe. Turing also noted that computers may be constrained but could still potentially do unexpected things — in the same way that humans can despite being constrained by our genetic makeup and biology.

Beyond this is the fact that the Turing test does not, per se, indicate consciousness or intelligence; rather it works to critique what is understood as thought and what could constitute thinking machines. The test is also reliant on the judgement of the interrogator, a comparison to humans and the judgment of behaviors only.

Then there’s the argument that the Turing test is designed around how a subject acts, meaning a machine can merely simulate human consciousness or thought rather than actively having its own equivalent. This can lead to the Turing trap — in which AI systems are excessively focused on imitating humans rather than being designed to have functions that allow humans to do more or boost their cognition beyond the possibilities of the human mind.

Is the Turing test still relevant?

While the Turing test might be held as a benchmark for AI systems to surpass, Eleanor Watson, an expert in AI ethics and member of the Institute of Electrical and Electronics Engineers (IEEE),told Live Science that “The Turing Test is becoming increasingly obsolete as a meaningful benchmark for artificial intelligence (AI) capability.”

Watson explained that LLMs are evolving from simply mimicking humans to being agentic systems that are able to autonomously pursue goals via programming “scaffolding” — similar to how human brains build new functions as information flows through layers of neurons.

“These systems can engage in complex reasoning, generate content creation and assist in scientific discovery. However, the real challenge isn’t whether AI can fool humans in conversation, but whether it can develop genuine common sense, reasoning and goal alignment that matches human values and intentions,” Watson said. “Without this deeper alignment, passing the Turing Test becomes merely a sophisticated form of mimicry rather than true intelligence.”

Essentially, the Turing test may be assessing the wrong things for modern AI systems.

As such, scientists “need to develop new frameworks for evaluating AI that goes beyond simple human imitation in order to assess capabilities, limitations, potential risks, and most importantly, alignment with human values and goals,” Watson said.

Unlike the Turing test, these frameworks will need to account for the strengths of AI systems and their fundamental differences from human intelligence, with the goal of ensuring AIs “enhance, rather than diminish, human agency and wellbeing,” Watson added.

“The true measure of AI will not be how well it can act human,” Watson concludes, “but how well it can complement and augment humanity, lifting us to greater heights.”

Share.

Leave A Reply

Exit mobile version