The Illusion of Thinking AI

Human Beings are special creation of the Supreme Creator ALLAH, blessing Man with special capabilities, capacities and attributes. The man has achieved many remarkable successes over a long period of time and during last century astounding inventions have been made. One such is Artificial Intelligence and man intends to reproduce its own clone as a fully functioning replica. This write up is an opinion about a research paper on AI being shared for wider audience discussions.

بِسۡمِ ٱللهِ ٱلرَّحۡمَـٰنِ ٱلرَّحِيمِ

In the name of ALLAH, the Most Gracious, the Most Merciful

*The Illusion of Thinking* AI

Apple has just published a paper with a devastating title: *The Illusion of Thinking*. The paper is available on this link:... https://machinelearning.apple.com/research/illusion-of-thinking

Apple’s "The Illusion of Thinking" (2025) research reveals that Large Reasoning Models (LRMs) like Open AI’s o1 often fail at complex, multi-step logic puzzles, exhibiting "accuracy cliffs" where performance drops to zero as problems increase in difficulty. Instead of true reasoning, these models often rely on sophisticated pattern matching and, when faced with high complexity, actually reduce their reasoning effort.

Key Findings from "The Illusion of Thinking":

Accuracy Collapses: When tested with puzzles requiring deeper reasoning (e.g., increased disk counts in the Tower of Hanoi), AI models that initially show promise suddenly fail completely.

Pattern Matching vs. Logic: The study suggests that while models appear to "think" via generated chain-of-thought traces, they are primarily mimicking familiar patterns rather than solving novel, complex problems.

Reduced Effort on Complex Problems: Contrary to human behavior, these models sometimes produce shorter, less detailed reasoning traces for harder problems, indicating a tendency to abandon complex reasoning for a quick (and often wrong) guess.

Limitations in Generalization: The models fail to generalize, suggesting they struggle to apply logic outside of training distributions.

An opinion on Apple’s paper "The Illusion of Thinking" (shared on x.com)

Apple has just published a paper with a devastating title: *The Illusion of Thinking*. And it's not a metaphor. What it demonstrates is that the AI models we use every day - yes, ones like ChatGPT - don't think. Not one bit. They just imitate doing so.

The paper argues that those models, no matter how brilliant they may seem, do not understand what they are doing. They do not solve problems. They do not reason. They merely generate text word by word, trying to sound coherent. Real thought: zero.

To demonstrate this, Apple designed a series of experiments with logic puzzles: Tower of Hanoi, the river-crossing problem, stacked blocks, etc. The same ones we use to see if a human or even a child can reason in steps.

In the first one, for example, they put the AI to solving the Tower of Hanoi. With 3 disks, it solves it perfectly. But as soon as you add more difficulty, more disks, the model starts to get confused. It repeats movements. It skips steps. It contradicts itself. It fails.

Was the solution too difficult? No; because in many cases, the researchers gave it, *the correct algorithm* step by step, as a helping hand. And you know what happened? It still couldn't follow it, not even by copying the homework.

Second example: the classic river problem. You have to cross a wolf, a goat, and a cabbage, without leaving them alone if one eats the other. The AI does it well… until you add one more restriction. That's when it starts doing exactly what it shouldn't do.

But the most unsettling thing isn't that it makes mistakes. It's that when the problem becomes more complex… the AI "thinks" less.

Literally: it uses fewer tokens, takes fewer steps, and explores fewer solutions. As if it was silently giving up.

Apple measured how many tokens the model dedicated to reasoning. It found a very marked curve: when the problem gets difficult, the model starts to generate *less* reasoning. Exactly the opposite of what a human would do. Why does this happen?

Because, AI doesn't know, if, it's doing well, or poorly. It has no sense of an objective. It doesn't correct. It doesn't compare. It doesn't evaluate. It just completes text, as if it were writing without knowing what for.

This breaks a very widespread idea:

“If we keep giving it more data, more parameters, and more power, AI will become super intelligent.” Apple's paper says: probably not. Because *there is no real thinking to scale*. What these models do is seem intelligent. And that’s the most dangerous thing.

Because when they sound convincing, we believe they understand. When they reason out loud, we believe they’re thinking. But it’s pure theater. What you see as reasoning is just an act.

The AI says: “first I do this, then that other thing…”

but it doesn’t *understand* the logic behind it. It’s only imitating structures it saw in its training. And when it doesn’t recognize them, it improvises poorly.

This does not mean that AI is useless.

But it does mean that we cannot treat it as if it had human capabilities:

it does not plan, it does not get frustrated, it does not improve its strategy. It has no will, nor purpose, nor even awareness of error.

The real risk is not that it thinks too much. It’s that it thinks *nothing*… and yet we still give it power. Because the more convincing it sounds, the more likely we are to mistake it for something it’s not.

So the next time Chat GPT, Claude, or Gemini say to you:

“Let me think"… Stop..... And remember:

They’re not thinking. They’re guessing.

The Debate and Context:

While the paper published in Apple highlights significant limitations in current reasoning models, the debate continues over whether this constitutes a fundamental failure of AI reasoning or merely an indication that more advanced training or prompting techniques are needed. The arguments presented in the paper were considered in the same spirit by another scholar who presented counter arguments with title "The Illusion of the Illusion of Thinking": A Comment on Shojaee et al. (2025). The counter argument given is available on this link... https://arxiv.org/html/2506.09250v1

Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit ”accuracy collapse” on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues:

(1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs;

(2) The authors’ automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities;

(3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N≥6 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures.

These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.

Concluding Analysis and Comments

Apple Research argues that current AI, despite its impressive appearance, hits a ceiling with true reasoning.

Counterarguments ("The Illusion of the Illusion"): Some critics, such as those discussed in this LinkedIn post by Arize AI and in this Reddit thread, argue that the tasks were excessively tedious and that the failures reflect limitations in evaluation design rather than fundamental failures of reasoning. This carries a message or a pointer for Industry; suggesting that there is a lot of space for improvement; because, this research raises questions about the long-term potential of scaling up current LLM architectures to achieve artificial general intelligence (AGI).

Philosophically; is it possible for humans to develop AI more powerful than own's?

According to most AI scientists, this will certainly happen, and the key question is not IF this will happen, but WHEN, (e.g., Müller and Bostrom, 2016). At a system level, however, multiple narrow AI applications are likely to overtake certain degree of human intelligence in an increasingly wide range of areas; which is possible logically also. However, it also suggests that AI scientists have little or poor understanding of human intelligence, probably because they are not aware of or considering the biology of humans and psychological sciences. This blog shed light on certain aspect of psychology for better development of AI..

https://blogs.bangboxonline.com/posts/self-memory-and-ai-memory

So, let's look at what is intelligence?

Daniel Dennett (a philosopher) defines intelligence as a fundamentally biological, evolved phenomenon centered on the capacity to learn, anticipate the future, and generate "competence without comprehension." Dennett often emphasized that while humans have high-level comprehension, much of what we call intelligence is a form of advanced, specialized "tricks" or "competence" built up through evolution and cultural learning. The key of his arguments are as follows:-

Competence without Comprehension: Dennett argues that intelligence does not require understanding why something works. Rather, it is the ability to produce functional, successful results through accumulated, evolved, or learned processes.

Information Processing: He views intelligence as a system’s ability to extract information from the past and use it to anticipate or navigate the future.

The Intentional Stance: He suggests that intelligence is often best understood by adopting an "intentional stance"—treating a system (human, animal, or AI) as an agent with beliefs and desires, predicting its actions based on what it is "rational" for it to do.

Evolutionary Perspective: Intelligence is a gradual, tiered process (from Darwinian to Skinnerian to Popperian/Gregorian creatures) that has developed over billions of years.

Conclusion

The "Man" is a special creation in this universe and has exhibited certain "Intelligence" in making his own life comfortable by discovering scientific principles and inventing many tools in recent times. Over the years, the human intelligence remained captive to agro-economy and only during the last half century, progress and development has been made in digital space. The debate about development of AI better than humans' own is actually a digital age discussion. So, if "Intelligence" is about "specific measurement" of how well a species accomplishes a "particular type of task" then AI may be better than certain large number of humans populated on Earth; but logically, real biological human intelligent beings will more capable than AI. However, AI may not be more intelligent than real biological humans, but will be just different in performing certain tasks as compare to average humans.

AI will keep on developing with each passing day, provided remarkable progress is made in certain areas of LLM and supporting architecture and processing system. However, human intelligence is not a digital science; it’s a product of biology, psychology, culture and civilization. Every child born is a combined product of parents' intelligence; having a divine architecture and programming; which can never be cloned or replicated. AI can not replicate human's conscience nous, resilience, tolerance, patience and output under life threatening situations, which are all products of human intelligence.