I Hallucinate More Than LLMs

Why your brain – not your model – may be the riskiest generative system in the room

Dec 09, 2025

Our minds are generative engines. All day long we combine ideas in new ways, imagine futures that do not yet exist, fill gaps when information is missing, and reconstruct memories that are always slightly edited. Psychology gives these processes respectable names – creativity, intuition, sense-making – but from an information-theory perspective they look suspiciously like what we criticise in AI as “hallucinations”.

Three ingredients are always running in the background:

We create: we recombine concepts to generate novel strategies and narratives.
We misremember: we store compressed stories, not exact transcripts.
We fill gaps: when data are incomplete, we infer motives, extrapolate trends and construct meaning.

In other words, our internal model of reality is constantly being generated, not simply retrieved. Large language models, by contrast, do something much narrower: they predict the next token in a sequence using patterns learned from training data. They have no inner sense of truth, no lived experience, no fear of being wrong. Yet the effect can look similar: fluent, confident that may or may not be grounded in reality.

For leaders deciding how far and how fast to adopt GenAI, this distinction matters. The real question is not “How do we stop AI from hallucinating?” It is: How do we manage two different kinds of hallucination – human and machine – in the same organisation?

1. Brains Are Built to “Make Things Up”

1.1 Creativity: Structured Hallucination

Human creativity is essentially controlled hallucination. We generate ideas that do not yet exist by combining existing concepts in new configurations. That is how strategies, products and business models are born.

Neuroscience and cognitive psychology describe this as generative cognition: the brain simulates scenarios, plays with counterfactuals (“what if we…”), and explores possibilities far beyond the data immediately in front of us. This is a feature of human intelligence, not a bug.

LLMs simulate something superficially similar – recombining patterns from their training data – but they do not intend to innovate. They are not trying to change the world; they are solving a probability problem.

1.2 Memory: Confident but Wrong

Human memory is not a hard drive. It is reconstructive.

Decades of research show how easy it is to distort or even implant entirely false memories through suggestion and misleading information. People will later recall those invented events with full confidence. In corporate life this looks like:

“We tried that in 2018 and it didn’t work” (when the context was actually different).
“The customer asked for X” (when the email trail shows something else).
“Legal blocked this last year” (when no such decision was recorded).

We rarely label these episodes as hallucinations. We call them “misunderstandings” and move on.

1.3 Context: Filling the Gaps

When information is incomplete, the brain fills gaps automatically. We infer motives, extrapolate from a small sample, and project past experience onto new situations.

The heuristics and biases programme showed that people rely on mental shortcuts – availability, representativeness, anchoring – which are efficient but systematically distort judgement under uncertainty. These fast-and-frugal rules are exactly what makes experienced leaders effective in noisy environments – and exactly what makes them prone to misreading weak signals, overreacting to vivid anecdotes, or underestimating low-probability risks.

In short: humans “hallucinate” because our cognition is built for meaning, not perfect accuracy. In leadership, that is often an asset. But it is not error-free.

2. What Models Actually Do When They Hallucinate

LLM hallucinations are different, and we should be precise about it.

2.1 Statistical Text, Not Understanding

An LLM generates text by predicting the most likely next token given the previous tokens. It does this using statistical patterns in large training corpora. It has no internal model of “truth versus falsehood” beyond those patterns.

IBM’s technical documentation defines AI hallucinations as outputs where a model “perceives patterns or objects that are nonexistent… creating nonsensical or inaccurate outputs”. In practice, this means:

Fabricated citations and URLs
Non-existent laws or standards
Plausible but wrong numbers, dates or attributions

The key is factual ungroundedness – the model is not supported by any reliable source, but the text looks convincing.

2.2 The “Never Say I Don’t Know” Problem

Most models are trained and evaluated in ways that penalise abstaining and reward fluent output. Benchmarks usually ask them to answer every question. UX patterns often invite the same.

This pushes models into “compulsory answering” mode. When they are uncertain – for example, because the query is niche, ambiguous or outside training data – they still generate something. That something is often what we call hallucination.

The parallel with human behaviour is uncomfortable: many organisations reward confident speech over calibrated uncertainty. The culture encourages leaders and experts to speak even when the evidence is thin.

2.3 Hallucination Is a Design Variable, Not a Constant

Hallucination rates vary dramatically with how you use the model:

Closed vs open tasks. Summarising a specific document with strict instructions to stay within the text produces far fewer hallucinations than answering open-ended questions from general knowledge.
Grounding with retrieval. Retrieval-augmented generation (RAG) mitigates hallucinations by forcing the model to base answers on relevant documents or databases, though it does not eliminate them – especially if the retrieved content is wrong or conflicting.
Allowing abstention. Systems that let models answer “I don’t know” or “No evidence found” can significantly reduce hallucinations, at the cost of leaving more questions unresolved.

So saying “LLMs hallucinate” without specifying task, data, and guardrails is analytically meaningless. The same would be true if we said “employees make mistakes” and stopped there.

3. Your Organisation Already Runs on Unlogged Hallucinations

Once you see both kinds of hallucination clearly, a more uncomfortable picture appears: your business already depends heavily on human hallucinations – you simply do not call them that, and you do not measure them.

3.1 Strategy as Narrative Generation

Corporate strategy is, in practice, a story about where the world is going and what your organisation should do about it. That story is built from partial data, selective benchmarks and extrapolations from previous cycles.

Very few companies systematically backtest their strategic assumptions. Forecast accuracy, scenario probabilities and risk assessments are often handled informally. Yet these untested narratives drive billions in capital allocation.

3.2 Expert Judgement in High-stakes Domains

In fields such as healthcare, law or finance, human experts make high-consequence decisions under uncertainty. Research on diagnostic error, for example, has consistently found non-trivial rates of missed or delayed diagnoses, some of them avoidable. These are not minor slips; they are serious divergences between perceived reality and actual conditions.

If we held human expert performance to the same audit standards we now demand for LLMs, we would discover that the baseline is nowhere near zero.

4. From Zero-Error Fantasy to Error Budgets

The biggest governance mistake you can make as a C-level leader is to compare GenAI against a fantasy world in which humans never hallucinate.

A more professional approach is to think in terms of error budgets:

What is our current human error profile in this process?
Even an approximate estimate is better than assuming perfection.
What error profile can we tolerate, for whom?
Distinguish between internal brainstorming, customer-facing content, and regulated decisions. The acceptable risk is not the same.
How will an LLM change that profile?
It may reduce some errors (e.g., inconsistency, missing information) while introducing new ones (e.g., fabricated details). Quantify both.
What controls and monitoring will we use?
- Grounding via curated data (RAG, structured knowledge bases)
- Confidence thresholds and “I don’t know” outputs
- Human review on high-risk use cases
- Logging and sampling for post-hoc audits and continuous fine-tuning

This is how safety-critical industries already operate. Aviation does not demand zero mechanical failures; it designs systems where failures are anticipated, contained and learned from. GenAI strategy should copy that mindset.

5. Designing Human + AI Truth Pipelines

If you accept that both brains and models hallucinate, the goal becomes to let their strengths cancel out each other’s weaknesses.

5.1 Make Humans Show Their Working

We already demand that from models (“show sources”, “cite the policy”). Do the same with key human decisions:

Require links to data, documents and assumptions in strategy papers.
Encourage ranges and confidence levels instead of single-point forecasts.
Log major decisions with their rationale so you can review them later.

This will surface human hallucinations that today go unchallenged.

5.2 Force Models to Respect Reality

Treat LLMs as interfaces to your data, not as free-running oracles:

Use retrieval to constrain answers to your documentation, contracts, guidelines and curated external sources.
Implement automatic checks for fabricated URLs, citations, package names or product codes, especially in code and legal text.
Define where the model must abstain and escalate to a human.

5.3 Normalise “I Don’t Know”

For both humans and machines:

Change UX so that “no answer” is a valid and visible outcome, not an error.
In meetings, actively reward people who flag uncertainty instead of improvising.
Track abstentions and escalations as valuable signals, not as failure.

The paradox is that systems which admit their ignorance are usually safer and, over time, more trusted.

6. Closing Thought: The Most Dangerous Hallucination

Before you approve or reject the next GenAI initiative, run through five questions:

Baseline – How often is the current human process wrong, and in what ways?
Impact – Who gets hurt when we are wrong (customers, patients, regulators, employees), and what is the cost?
Shift – How exactly would an LLM-supported workflow change that risk profile? Be specific.
Controls – What grounding, guardrails, human reviews and monitoring will we put in place?
Learning loop – How will we log and analyse both human and model errors so the system improves over time?

The issue is that you are already making decisions in an environment where hallucinations – human ones – are unmeasured. The difference is that now I treat that as a design constraint, not as a badge of infallibility.

The most dangerous hallucination in the GenAI debate is not produced by a model. It is the comforting human belief that we do not hallucinate – that our memories are accurate, our narratives objective, and our expertise infallible.

Once you drop that illusion, a more realistic – and more productive – stance emerges:

Humans and LLMs both produce fluent, confident errors.
The mechanisms are different, so the mitigation strategies must be different.

The winning organisations will be those that deliberately design human + AI systems where hallucinations are anticipated, measured and contained, instead of denied.

Building Creative Machines

Discussion about this post

Ready for more?