# Explaining Explainable AI

It’s easy to forget that language is (muscle) action. Reasoning & explanations are thus actions in time. Acting is discrete: you act or do not act. There is no 80% act. Explanations mirror our mental models of causal action, which provide a best guess of efficacy.

So if I want to perceive a change in my environment, I represent the change as action that causes the change. The speech act then mirrors that action & even operates on tied symbols for the muscle movement for the action. (Also interesting that stock photo collections for “explain” all have someone pointing or gesturing!)

But what this means is that language & explanations are discrete & unitary – DAG-like. But the underlying models are probabilistic as observation in environment is probabilistic. We have problems when we confuse the model with the actual environment.

Much of science is the teasing out of relationships in reality that are closest to the models – hence it is easy to confuse model with reality (ala Plato) – F=ma etc. Once we pick all this low hanging fruit we are left with more imperfect models: see quantum physics & medicine.

Now imperfect need not always mean inexact. But it does general require models that have a better mapping to the underlying probabilistic reality. Drug A fixes cancer B 80% of the time; when it doesn’t work you can try drug C; drug A has side effects X, C has Y.

The irony being that the substrate of the brain is built on probabilistic explanations but we cannot access these directly – we only have language (& language-like discrete causal / static action models).

Hence, following Gödel, any language utterance is ultimately in some way imprecise; no explanation can be perfectly true as even “truth” itself if built in the model framework not in the (measurable) reality.

But with neural networks, we have a category error – we expect an "explanation" similar to declarative programming languages but the latter matches the logic of language while the operations of neural networks, despite being deterministic, operate at >>scale & >>dimensionality.

Hence, you can't match the operations of a neural network to language and that is actually the reason for the success of neural networks – they learn to probabilistically model a "reality" as represented by the training data.

To "explain" what a neural network is doing requires the mapping of a higher-level (lower-dim primitive) model, language-like (binary action, consistent), onto the actual high-detail operation. But this explanation will never be completely true – the same as our own explanations.

People discussing "AI explanations" tend to make one of two errors:
1) they assume normal human explanations are a "true" representation of reality;
2) they assume "explanations" can perfectly represent lower-level probabilistic high-dim data structures & operations.

For humans, the "truth" of an explanation is normally reflected by the correspondence with experiments in reality. In science, conditions are controlled so you can focus on one causal link that is not disrupted by other causal links. Hence, model ~ reality.

TBF when people go on about "algorithms" being the problem they're actually right. It's the hubris that the complexity of the world can be modelled as a set of simple steps in a declarative language that is the root of many problems with computers.

The irony being that people complaining about "algorithms" in this day & age are often complaining about neural architectures that are not "algorithms" but learnt layered non-linear highD optimisations.🤷

Anyway, back to “explainable AI”. An obvious Q is explainable to whom? Language is a two-way process & needs to take into account shared mental representations (or at least some vague overlap of correlations). Am I explaining to an expert or my kids?

Much of the time folks often mean: 1) explainable to a user (often an expert); 2) explainable to an average member of public; or 3) explainable to person with responsibility/oversight/sanctioning authority.

And as any specific language is defined by use, we can learn context-dependent correlations between entities understood by each audience by processing media designed for each audience.

Explanations start then becoming chains of relations between entities (in time +/ space) that are filtered by fit to the underlying statistical measurements.

Within an explanation you can zoom-in/out in both space + time where time can also change space. The general form of an explanation is first as general and short as possible while retaining specificity & relevance for action.

Things you can change, manipulate or judge are poor nodes in an explanation – explanations (like most reasoning) are actually future-action oriented despite mainly being about a past/static set of things. “You” here being an actor relative to the thing providing the explanation.

So there’s actually a pit-load of philosophy to crunch through before you can even get an engineering a solution. Assumptions crumble like origami under the weight of inspection.

As per Lakoff, if the audience doesn’t have direct knowledge of the entities you want to use as nodes in your explanation, you can substitute using metaphor/analogy & omit detail to maintain relevance/applicability.

Metaphor & analogy working in the first place because the brain is representing, via HPC, *general* patterns of relations between nodes with flexible mapping of node<>sensory “thing” (all originally based on spatial relations between sensory landmarks for navigation).

A lot of the time there’s implicit in an explanation the point that if we undertake the sequence in the explanation we should more-or-less obtain the same output from similar starting configuration (again basis of “scientific” experimentation).

But no two moments are the same (rivers & Heraclitus &c) & neither is it guaranteed that action will play out the same so a *useful* explanation is one that concentrates on the most robust replicable patterns under control of the agent in Q (normally receiver of explanation).

All this appears achievable with a computer/artificial system – it’s just few mainstream discussions of explainable even remotely touch on points here. They normally want a *relevant & actionable* logic path that exactly matches reality at a level understood by general public.

While all this implicitness is normally filed under “common sense”, if you read any of the literature on neurodiversity you realise that not even all humans share the underlying assumptions.

(Ah if only my *actual* job was chatting all day with teams of engineers building explainable AI systems where this self-talk into the ether was actually useful in somekinda way. Hey ho.)

Originally tweeted by Ben Hoyle (@bjh_ip) on 27 April 2022.