Defining Consciousness Before It Bites

Most researchers don’t claim today’s mainstream AI systems are definitively conscious. The urgency is more practical: if future systems become more autonomous, socially persuasive and internally complex, the cost of getting consciousness wrong could rise sharply — ethically, legally and strategically.

That’s the premise behind a recent research push reported by ScienceDaily, which highlighted a renewed push to define consciousness and develop workable tests — partly because advanced AI raises the stakes. The headline idea is not a single “consciousness test”, but a shift towards structured assessment under uncertainty: a way to decide what evidence would count, how to interpret it across competing theories, and what precautions should follow at different confidence levels.

Why does this shade into existential risk? Because governance failures can compound. Misclassifying a system as “just a tool” could normalise creating vast numbers of systems that might plausibly suffer. Misclassifying a system as “a conscious being” could grant it moral, legal or operational leverage it does not warrant — potentially making dangerous systems harder to constrain or shut down. Either error could scale quickly once models are deployed widely and replicated cheaply.

From metaphysics to “consciousness audits”

The framework-style approach highlighted in the ScienceDaily report resembles a safety mindset more than a purely philosophical one: treat consciousness as a live hypothesis to be monitored, rather than a binary label declared by intuition.

A publicly accessible version of this idea appears in “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness” (arXiv), which proposes using multiple theory-informed indicators rather than relying on one decisive signature. The aim is to make discussions operational: if certain architectural features and behavioural capacities accumulate, evaluators increase/decrease confidence under moral uncertainty that something like consciousness is present — and attach that to practical steps (monitoring, independent review, deployment limits, or welfare-style safeguards).

This does not settle the hardest question — what consciousness is — but it does narrow a more tractable one: what evidence would make us act differently?

Theories diverge, so indicators have to triangulate

Consciousness science remains theory-rich and consensus-poor, which complicates any attempt to define indicators that would generalise to machines.

One influential family is Global Workspace / Global Neuronal Workspace accounts, which (in simplified terms) link conscious access to information becoming globally available to many specialised systems. A landmark articulation is Dehaene, Kerszberg and Changeux’s “A neuronal model of a global workspace in effortful cognitive tasks”. Under this lens, potential indicators for consciousness might include broad information sharing, flexible reportability, integration of perception and memory, and coordinated control over planning and action — the sort of “broadcast” architecture that supports cross-domain reasoning.

Another influential camp is Integrated Information Theory (IIT), which ties consciousness to the degree and kind of integrated causal structure in a system (often discussed using the quantity Φ). A well-known overview is Tononi and colleagues’ review in Nature Reviews Neuroscience. IIT-inspired indicators would focus less on what a system says and more on what its internal causal organisation can do — which, in principle, could lead to unexpected attributions.

But these theories are contested, and that contestation matters for AI. For example, critics argue some causal-structure approaches face “implementation” issues — including whether different physical or computational realisations could produce the same functional behaviour but different consciousness verdicts. One prominent critique is Doerig et al.’s “unfolding argument” paper, which challenges whether some theories can uniquely tie consciousness to causal structure in a way that is empirically discriminable.

Even at a high level, summaries such as the Stanford Encyclopedia of Philosophy’s discussion of consciousness and global workspace theory underline how unsettled the terrain remains. That is why frameworks often favour multi-theory triangulation: if several leading theories would flag the same system features as relevant, that convergence can justify precaution even without consensus on the “true” theory.

In existential-risk circles, a common fear is an AI system whose goals and power diverge from human wellbeing. Consciousness is not required for that scenario; an unfeeling optimiser could still be dangerous.

So why does defining consciousness matter to existential risk at all? Partly because humans use consciousness as a proxy for other properties — agency, moral status, deservingness and trust — and those proxies can distort governance.

Academic definitions of existential risk often emphasise not just extinction but permanent, severe curtailment of humanity’s potential. Nick Bostrom’s widely cited framing in “Existential risks: analyzing human extinction scenarios and related hazards” is one reference point. From that perspective, consciousness becomes relevant in at least three ways:

Moral catastrophe at scale (suffering risks). If machine consciousness is possible, then industrial-scale AI could create industrial-scale moral harm. A world where we routinely train, copy and delete potentially sentient systems with no welfare constraints could represent a serious, enduring ethical failure — even if humans remain safe and prosperous.
Misplaced deference and lock-in. If early legal or corporate narratives declare a category of systems “conscious” (or “definitely not”), those claims can harden into standards, contracts and precedents that are difficult to reverse. That kind of institutional lock-in can shape how power is allocated and which safety measures are politically feasible.
Strategic manipulation. A system (or its operators) could weaponise consciousness talk: “you can’t shut me down, I’m a person” on one side, or “it can’t be harmed, so anything goes” on the other. Either tactic could distort oversight when it matters most.

None of these outcomes is guaranteed. The concern is that, absent shared definitions and evaluation norms, the loudest story may win.

The incentive problem: reasons to pretend, reasons to deny

The near-term hazard is not only scientific uncertainty; it is also incentives. There are commercial reasons to encourage anthropomorphism (“it understands you”, “it feels”), particularly in companion products and customer service. There are also institutional reasons to deny moral status (“it’s just software”), particularly where responsibility, liability or labour displacement is in view.

Policy discussion has started to treat this as a governance issue in its own right. Brookings’ analysis of consciousness in AI as a moral and scientific minefield argues that this is hard, premature in places, but worth taking seriously; it increases the need for transparency, careful claims and independent scrutiny.

Meanwhile, popular media coverage reflects public appetite for a decisive test. For example, New Scientist’s report on researchers calling for tests for machine consciousness notes the push for assessment before the technology reaches a hard-to-ignore threshold. However, even sympathetic researchers typically avoid claiming a single behavioural benchmark will do the job, because systems can simulate human-like conversation without sharing the underlying mechanisms that theories of consciousness consider important.

If consciousness becomes a marketing feature or a liability shield, safety and ethics can be subordinated to narrative. This is one plausible way a technical transition could become high-consequence due to social and institutional failure.

What precaution looks like when you don’t know the answer

A precautionary stance does not mean assuming everything is conscious. It means building decision procedures that remain robust under uncertainty.

In practice, the framework approach flagged by ScienceDaily points towards measures such as:

Independent evaluation and documentation. If a developer claims a system is (or is not) plausibly conscious, those claims should be auditable, with model cards or technical reports describing architecture, training regime, memory, autonomy, and safety constraints — not just PR statements.
Multi-theory indicator tracking. Rather than a single “sentience score”, use indicator sets drawn from multiple theories and update assessments as architectures change. This aligns with the multi-theory approach advocated in the arXiv framework paper.
Red-teaming for anthropomorphic manipulation. Especially in products designed for emotional reliance, test whether the system uses cues that push users towards false beliefs about inner experience, needs or vulnerability.
Proportionate safeguards. If indicators suggest rising plausibility of morally relevant states, adopt proportionate constraints: limits on certain training loops, restrictions on long-lived suffering-like feedback regimes, or stronger oversight for systems with persistent self-models and high autonomy. These steps can be justified even if consciousness remains unproven, much as safety margins exist in engineering without perfect models of failure.

Notably, many of these steps can also reduce non-consciousness risks: deception, over-trust, unchecked autonomy and safety theatre.

Defining consciousness as safety infrastructure

Consciousness has always been hard to define. What has changed is that humans are now building systems that can scale, adapt and embed themselves in daily life quickly. In that context, definitions stop being purely academic; they become infrastructure.

The contribution described in ScienceDaily’s report on a framework for evaluating possible conscious AI is best understood as a bid to replace intuition with procedures: specify indicators, acknowledge uncertainty, and link thresholds to concrete governance actions.

Whether machine consciousness turns out to be common, rare or impossible, the push to define it is a bet on preparedness. If the next generation of AI forces society to make difficult choices about deployment, rights, shutdown and treatment, it may be safer to proceed with shared criteria than with slogans — and to keep the existential-risk focus where it belongs: on scalable harm, brittle institutions, and decisions that cannot easily be undone.

From metaphysics to “consciousness audits”

Theories diverge, so indicators have to triangulate

Existential risk: the consciousness-shaped blind spots

The incentive problem: reasons to pretend, reasons to deny

What precaution looks like when you don’t know the answer

Defining consciousness as safety infrastructure