The act of will and the action of the body are not two different states…; they do not stand in the relation of cause and effect but are one and the same thing…. The action of the body is nothing but the act of will …. translated into perception.
– Arthur Schopenhauer, 18191
Our social world is an ever-changing milieu in which boundless verbal and nonverbal signals are generated by fellow humans. To ensure our survival, we must perceive certain regularities from the complexity that surrounds us. A failure to meet this daily challenge may prove costly for some; social encounters trigger several psychiatric symptoms, while social withdrawal reduces their intensity, at least temporarily.2 For example, disorganization (or formal thought disorder), one of the central features of schizophrenia, presents primarily as a disruption in cooperative communication that occurs in a social context. Though many technical advances now allow us to study “socially” interacting agents in the laboratory (for example, see Kingsbury and colleagues3), psychiatric symptoms are rarely studied in the context of neural mechanisms of social encounters. To tackle this challenge, we need empirical tools to study the dynamic neural framework of social interaction, starting from a 2-person perspective. In this editorial, we first present such a tool: an emerging “active inference” perspective of cooperative communication between 2 individuals. We then introduce the 2-brain problem of formal thought disorder in schizophrenia as an exemplar case of its utility and map resulting theoretical expectations to known signs of this construct. Lastly, we highlight several experimental opportunities that arise from casting of formal thought disorder in the active inference framework.
Brain as a Bayesian organ
One of the important pursuits of clinical neuroscience is to explain how computational failures of the brain result in symptoms and signs experienced by patients. To get here, we need a framework on how the brain is employed to compute solutions for the challenges (e.g., recognizing an object or talking to a stranger) it purports to solve. One biologically plausible unifying hypothesis casts the brain as a Bayesian statistical organ whose function is to generate and refine models of the hidden properties of the world (i.e., causes and consequences) in order to reduce uncertainty4 (Box 1). Accordingly, the brain’s models of the world are predictions (or “priors”) held by neuronal units, while perceptions are inferences that arise from the probability estimations of sensory data against models. When the senses convey data that are not explained by the held models (“prediction errors”), this leads to either further data gathering (e.g., attending to the object or acting on the environment) or a model revision to reduce further prediction errors. In this framework, actions fulfill behavioural predictions; they simply arise from the drive to reduce the error between the predicted future (prior “expectations”) and current motor states.5 This idea, termed “active inference,” provides an algorithm for perceptions and actions,6 which, combined with a plausible implementational sketch of neural circuits (dynamic causal models; not discussed here, but see Petzschner and colleagues7 and Limongi and colleagues8 for examples), opens the door for parsimonious mechanistic explanations for psychiatric symptoms.
Definition
The term “Bayesian” refers to the process of assigning probabilities based on prior knowledge when interpreting data.
Precision-driven active inference
To function as a statistical engine, the brain needs to overcome 2 challenges: model selection and significance detection. First, as the sensory world is infinite, it is not plausible to instantly scan the entire model space to select a model to fit the incoming data. This is solved by a hierarchical framework of prediction propagation wherein specific model choice is dictated by the immediately higher level of predictive representations.9 For example, when we perceive what is being said by the person speaking to us in a noisy bar (“Would you like one more of the same?”), our models are progressively constrained by the situation (the bar), the speaker (the bartender who knows you by name), the context (having a recently emptied glass), and the event (running a tab). Second, many a time, there is likely to be a mismatch between our expectations (prior models) and sensations (sensory data); this mismatch, termed “prediction error,” can be a “chance occurrence” due to inaccuracies in the data we gathered. The brain, as an optimal Bayesian machine, must allow only the “reliable” prediction errors to propagate further, and trigger model revision or further data acquisition (as these are expensive).
If the sensory data are noisy and highly variable, then any information in that data that we did not predict with our prior models cannot be trusted. The prediction error triggered from processing these data will be considered to be unreliable or less precise (i.e., precision is the inverse of the variance). We can say that an unreliable prediction error is given less importance (or weighting) over the prior predictions, and the prior model is retained. Similarly, our prior models themselves also have variable degrees of precision; we cannot always rely on our models. Ultimately, for every encounter with some form of incoming data, our inferential bias toward our models (priors) versus data should be dictated by the reliability (or precision) of these 2 sources. In effect, this reliability is the learned probability of success of a model from previous experiences of deploying the model. To put it simply, if the brain is a flawless statistical organ, then any “significant effect” that it detects from the gathered data must be based on both the magnitude of the signal (deviation of the data from prior model) and the reliability (inverse variance) of both the data and the model.
Several lines of evidence indicate that precision is encoded by the brain (arguably, at the synaptic level) and is used to assign weight to sensory data (in the form of prediction errors as they ascend) as well as priors (in the form of predictions that descend).10,11 At times when our predictions are imprecise, for example, if a new bartender with an accent is speaking to us in a noisy environment, we either ask him to repeat what he said, move closer, tilt our head and pay more attention to his voice, or stubbornly stick to our expectations. In other words, in a volatile world where model revisions are often required, we either commit neural resources to improve the precision of sensory data (attentional selection12) or inflate the emphasis toward our predictions (confidence13). A similar process also operates for motor behaviour; when the consequences of a planned act are highly variable, we can still bias toward selecting an action by increasing precision afforded to a chosen policy or plan (i.e., increased commitment or motivation14). Note that this line of conceptualization brings us close to Schopenhauer’s assertion: action is nothing, but will translates to perception. This statement indeed foresees the perception–action loop articulated by active inference whereby action can be seen as the realization of motivated motor prediction.
The precision-driven active inference framework has been invoked in multiple transdiagnostic studies, including some published in JPN,15,16 to explain abnormalities in perception,17 interoception18 and emotional expression,19 to name a few. Here, we theoretically expand its application to disrupted communication in psychosis as a clinical phenomenon of interest.
Active inference and cooperative communication
The basic premise of active inference is that our interactions with the world are attempts to reduce the uncertainty about the models we hold of it. In our social world, the most critical uncertainty is the mental state of those we interact with. Speech is a crucial act by which we interact with our social world. The process of engaging in a discourse with another person involves both perception and action; this can be understood as an active inference process jointly deployed by the speakers to decrease their respective uncertainties about each other’s mental state.20 We achieve this by asking and answering questions directly to infer each other’s “model of the world.”21
For cooperative communication to occur, the 2 brains’ causal models and expectations should have a certain degree of alignment22,23 (see also Kingsbury and colleagues24). Language, by its very nature, provides these shared tools at several levels of the hierarchical priors (phonetics, semantics, syntax).25 Other features, such as culture, shared identity and common context, also provide higher-level alignment for shared message passing. 26 This sort of prior alignment is said to be species-specific and adaptive, making cooperative communication less demanding and more likely to occur in our social world.20
Brown and Kuperberg argue that predictive processes operate hierarchically across 3 levels in the generative model of language — semantic selection, syntactic construction and phonological articulation.9 Thus, higher-order priors (e.g., subject of interest to be discussed) hierarchically constrain the choice of downstream action sequences (e.g., ideas, sentences, words) to achieve the desired outcome of a social discourse. The hierarchical generative process is subject to internal (comparing the lower-level selections against the discourse plan and the message, to make changes as we speak) and external monitoring (comparing intended and achieved perceptual consequences of the output) at each level. Once a discourse is initiated, the further choice of messages and the structure and the length of responses depend on this self-monitoring feedback from articulation as well as the feedback from the listener (other monitoring). Both the verbal and nonverbal outputs from the self and the other person are crucial for this feedback process.
But what prompts a change in our planned discourse? Our individual beliefs (probability estimates) about the most socially valuable message choice tune the commitment placed on our discourse plan, and this in turn affects the precision afforded to the syntactic, lexical and phonetic choices down the hierarchy that we employ to convey the message (propagation of precision). Sufficiently precise priors at each level are required for the discourse not to drift on the basis of incoming cues; this also considerably increases the speed of action selection and implementation (i.e., winning combination chosen at an optimal psychomotor speed; see Parr and colleagues27 for limb movements). Through precision modulation, we afford sufficient commitment to the ongoing discourse plan, thus attenuating the relative impact of the sensory feedback. When the other person generates sufficiently precise output (for the speaker, this is feedback) that overcomes this commitment, we revise the ongoing speech production (e.g., changing the topic when we detect a reliable sign of displeasure).
In summary, cooperative human communication fits well with the active inference algorithm. Two speakers engage in a discourse to infer each other’s mental state, thus reducing the uncertainty about each other. Based on their probabilistic estimates (beliefs), they choose messages, discourse structure, and lexical choices with maximum likelihood to serve the goal of “reverse inferring” the other person’s mental state from the expected responses. The top–down conditional dependence of the subprocesses means that if the goal to infer the other person’s mental state is imprecise, this reduces the precision of message selection and the implemented discourse plan, as we discuss below.
Thought and language disorder in schizophrenia
Traditionally, linguistic abnormalities in schizophrenia have been considered as a part of a notable symptom cluster often termed “formal thought disorder” (FTD) or “disorganization,” while later the emphasis shifted to “speech disorder.”28 These debates arise from a “Cartesian” approach that sees language as merely an expressive system that encodes an independently constituted thought process. Recent developments in generative linguistics have emphasized an interactive interface between thought and language; in this Un-Cartesian approach, language and thought are seen as indissociable;29 this approach eschews the thought versus language versus speech disorder problem in understanding schizophrenia.30 When extending to the thought versus speech disorder issue in schizophrenia, the Un-Cartesian approach is also compatible with Schopenhauer (will v. act) and the active inference framework (prediction v. behaviour).
For some, FTD is essentially a 2-brain problem, and schizophrenia is primarily a disorder of communication.31,32 Disrupted speech arises from the speaker using ideas, words and phrases in such a way that their meanings are not socially shared.32 Several psychotic symptoms can be traced back to how the patient employs language to communicate (delusions, hallucinations).33 Nevertheless, unlike discrete language disorders such as aphasia, the hierarchical representations that form linguistic structure are affected in a reversed gradient.34 In schizophrenia, the pragmatics are affected more than semantic and syntactic levels, while the phonemic choices are mostly unaffected. 34,35 Cohesive ties in a discourse are not preserved (positive FTD: distractibility, tangentiality, derailment, peculiar word choices) and verbal productivity diverges from contextual requirements (negative FTD: reduced speech, weakened goal, perseveration). These communication deficits play a critical role in long-term functional deficits.36,37
In an attempt to explain language comprehension and production abnormalities in schizophrenia, Brown and Kuperberg hypothesized “a breakdown of generative circuits linking high-level message representations and lexical representations” and a failure of the time-sensitive predictive processing (whereby priors are accessible only after the bottom–up inputs are fully accessed) as the basis of positive FTD in schizophrenia.9 We extend these ideas to accommodate the social context of normal discourse (i.e., the 2-brain problem). In placing emphasis on 2 brains rather than 1 brain, we underscore that the neural processes pertaining to both individuals partaking in a conversation influence the “organization” or coherence. Thus, the processes that lead to verbal disorganization or FTD are best studied with both “brains” considered together, rather than seeking individual brain–symptom correlations as we practise at present.
An active inference account of discourse in schizophrenia
Several lines of evidence argue for a deficiency in precision modulation in schizophrenia.38,39 Within the hierarchical generative models of cooperative communication, we propose that a gradient of a precision modulation deficit affecting the highest levels of priors exists in patients with schizophrenia (Figure 1). When demands arise, patients cannot afford increased precision to their higher-order priors. When engaging a second person, this imprecision presents as a failure of commitment to a discourse plan40 with low confidence on the message choice (ambivalence). This state of low precision of higher-order priors makes all lower-level models equally likely for selection;41 this increases the likelihood of frequent shifts in conversational goal, messages and speech structure (loosened associations: derailment, incoherence).
The presence of imprecise priors at various higher levels of active inference considerably reduces the speed of message selection (reduced spontaneity) and implementation (reduced rate of speech). The relatively low precision of priors compared with sensory data enables external cues to repeatedly redirect the discourse (distractibility). The between-speaker asymmetry in the precision of shared priors, and the unpredictable shifts in the discourse increases the interlocutor’s uncertainty about the speaker’s mental state, who loses the conversational rapport (and is left with a “Praecox feeling”42). The external cues of this uncertainty are available to the speaker, but owing to the difficulty in precision affordance to the higher-order priors, the speaker presses on with the same imprecise stream of communication (circumstantiality, empty speech with little information). Nevertheless, as the precision afforded to lower-level priors is relatively preserved, this leads to word-level errors (repetitions, perseveration, persisting with peculiar word choices) and vague references. Further, the imbalance in the dyad’s relative precisions (of higher-order priors) also precludes the speakers from taking appropriate turns,23 leading to conversational failures. This precision-weighting imbalance also means that, after a few exchanges, the information flow becomes lopsided. Such asymmetric exchanges arising from nonaligned priors lead to the conclusion that the patient did not infer the mental state of the other speaker.43,44 In due course, the expectation that a social discourse can assist in inferring the other person’s mental state diminishes greatly, as it turns out to be a costly exercise with low returns. This is especially likely if the other person also has imprecise conversational priors.45 One’s estimated probability of causally influencing the immediate social world reduces over time, driving down any further active inference via speech/action (diminished expressivity, negative symptoms46 over the longer time course).
How does this imprecision of priors come about? First, precise priors emerge when large amounts of data have been sampled, with reinforcement of choices (termed “confidence accumulation” in decisional neuroscience47); for language, this is a developmental challenge,48 which when not met, increases the chance of priors to operate with lower precision. Second, Adams and colleagues49 argue a primary role for the glutamatergic system in maintaining synaptic gain and modulating precision; several empirical observations support a role for glutamatergic hypofunction in schizophrenia.50 Third, it is possible that patients perceive the social environment to be more volatile,51 thus estimating the consequences of their action plans to be less reliable than others. This can also arise from developmental adversities (trauma, bullying52) or experiences of significant shifts in social contexts (as in immigration). While the associations we make here are currently speculative, experimental determination of precision weighting of different levels of priors during social conversation is entirely plausible, in line with similar studies in perceptual processing.53
Recent observations indicate that noradrenergic neuromodulation may affect the precision of our priors;54 thus, stressful states where adrenergic arousal is high may reduce reliance on (i.e., precision weighting of) priors. In contrast, dopamine may play a crucial role in precision weighting of prediction errors, rather than coding predictions per se,55 while nondopaminergic modulators such as acetylcholine may affect the weighting of higher-order priors.56,57 Thus, a hyperdopaminergic state (as seen in some patients with psychosis58) may increase the rate at which the speech production priors are revised, further adding to their uncertainty. D2-blockers may reduce error-induced shifts in the discourse plan, correcting the positive FTD features, but cannot improve the (nondopaminergically mediated) precision of the higher-order discourse plan per se, and the eventual gravitation toward reduced speech output (negative FTD), as seen in the long term. More broadly, neuromodulatory aberrations in the higher-level association cortex result in excitation/inhibition imbalance that underwrites precision modulation38 as well as FTD59 in schizophrenia. Specific brain regions, such as anterior insula,60 dorsal anterior cingulate cortex61 and hippocampus, 62 appear to play a key part in precision-weighting mechanisms; the course of FTD may relate to the volumetric63 and connectivity64,65 changes of these regions in schizophrenia. It is important to note that irrespective of the location of lesions, the hierarchical precision-weighting framework enables propagation of failures at 1 level to the other levels of speech generation (see Palaniyappan66 for further discussion on the broader systems theory notion in psychosis).
In summary, FTD can be cast as a problem of precision modulation of higher-order priors during dyadic social encounters. This view concurs with that of Brown and Kuperberg, 9 who also proposed a role for imprecise priors (message-level predictions) and the resulting over-adjustment of the production plan. Our extension emphasizes a mismatch in the precision of shared priors in the speaker and listener as a crucial component of communication failure, and an inability to afford precision to discourse plans when demands arise (e.g., in a volatile social context) as a critical deficit in schizophrenia. Our model explains both positive and negative FTD and accommodates their trajectory and treatment response.
Putting ideas to test
The active inference account of communication deficits presented here casts many elements of FTD as a 2-brain problem of social inference. As such, scientific scrutiny of this theory mandates ecologically valid experiments to build empirical evidence. Psycholinguistic studies in psychosis generally focus on lexical or sentence-level processing rather than dialogic interactions; novel insights on communication emerge when natural conversation is studied, even at a single-brain level (for example see Castellucci and colleagues67). Second-person paradigms (e.g., hyperscanning to capture interpersonal interactions68 and brain–computer interfaces69) offer tantalizing possibilities in this regard. Indeed, emerging insights on “interbrain synchrony” of neural dynamics from healthy individuals have opened the door for second-person neuroscience in psychiatry.70 Many applications for investigating FTD can emerge from the advances being made in setting up (e.g., leader–follower experiments71), recording (e.g., functional near-infrared spectroscopy) and calculating (e.g., Granger causal models) interbrain dynamics, some involving more than 2 interacting agents in a social context.24
Parameters of precision modulation can be obtained from readily interpretable behavioural readouts such as eye movements (for example see Mirza and colleagues72); such readouts can be acquired in conjunction with speech measures as a first step to test general propositions linking precision modulation and FTD. While concepts such as commitment to a discourse plan do not have direct measures, the emerging application of Natural Language Processing is already providing several indirect means to estimate parameters of relevance (see Voleti and colleagues73 for a detailed review). This will enable us to acquire speech-derived measures of precision from snapshots of social discourse. Controlled experiments on certain neurophysiological phenomena related to predictive processing (e.g., electroencephalography/magnetoencephalography correlates of corollary discharge, mismatch negativity, certain evoked response potentials such as N400) can provide measures of “synaptic gain” reflecting the physiology of precision modulation.49 Further, using perturb-and-measure neuromodulation techniques like transcranial direct current stimulation and focal transcranial magnetic stimulation, we can test brain-network-level specificity for these hypotheses. Such studies have been providing interesting leads to understand symptoms such as auditory verbal hallucinations in schizophrenia,74 and can also help uncover specific neurostimulation targets for FTD. The conceptual relationship between the precision of action priors and motivation raises the possibility that social incentives may nurture the physiologic markers of “synaptic gain” (representing precision) and ultimately, verbal productivity itself.75 Appealing to precision dynamics also helps to explain why certain therapeutic approaches (e.g., insight-oriented therapies, group therapies) that demand one to infer and attune with another person’s priors repeatedly fail in schizophrenia. Following this line of investigation may potentially offer new therapeutic speech and language approaches in psychosis.
Though active inference has provided biologically grounded arguments to blur the boundary between “willing” and “acting” since the time of Schopenhauer, in its current edition, active inference is still a theoretical framework (Box 2). At times, this framework has been criticized as being too committed to the statistical notions of brain function, being somewhat abductive and, to some extent, dismissive of more conventional domain-specific neuropsychological concepts.78–80 While its simplicity and physiologic plausibility have propelled its wide applications,81 it is necessary to consider that empirical evidence is still lacking for many of its premises; for example, “synaptic gain represents precision,” “neurons maintain representational codes for various aspects of the brain’s models of the world,” “precision can be amplified at a subpersonal level,” and “compensatory changes to balance precision weights occur in pathological states” (see Keller and colleagues82 for a review of evidence on neural implementation). Nevertheless, this heuristic has been immensely useful for mapping observed psychopathology to brain functions. In the case of FTD, which continues to elude neurobiological explanation, it offers a principled entry into the “dark matter” of social neuroscience.83
The fallacy of brain as a computer
Throughout this editorial, we take the stance of considering the brain as a statistical engine. This is in keeping with the theoretical perspective of active inference, wherein the neuronal precision-weighting processes are described as “subpersonal,”20 to distinguish them from the more conventional, affect-laden descriptions of these terms (e.g., “confidence” or “motivation” is described as precision afforded to higher-order priors such as an action policy). Nevertheless, the functional consequences implied by these terms are comparable. For instance, daily life is replete with examples where affordance of motivation is necessary for actions with highly variable consequences (e.g., going to the gym, buying lottery tickets). One objection to this “subpersonal” stance is that brains do not compute or intend; it is the person with the brain who does such computations and holds intentions. The mereological fallacy here is that we ascribe functions of the whole (person) to its parts (brains or neuronal units). Although detailed counterpoints against this objection can be found elsewhere,76 it is important to ask when, if ever, can we attribute the function of a whole to one of its parts. We can say “a car moves” and “a wheel moves,” but we cannot say “a steering moves.” We can also say “I drive a car,” but we cannot convey much by saying “I drive a wheel.” In other words, for certain qualified functions, the use of intentional predicates for certain parts is agreeable. Furthermore, if we consider human intention as a socially emergent phenomenon,77 it cannot be ascribed to any single part (e.g., a neuronal unit) or even a single person. In our view, these issues do not diminish the utility of the proposal made here; nevertheless, they pose a far from trivial challenge when applied to the philosophy of psychiatric practice.
Footnotes
The views expressed in this editorial are those of the author(s) and do not necessarily reflect the position of the Canadian Medical Association or its subsidiaries, the journal’s editorial board or the Canadian College of Neuropsychopharmacology.
Competing interests: L. Palaniyappan reports personal fees from Janssen Canada, Otsuka Canada, SPMM Course Limited, UK, and the Canadian Psychiatric Association; book royalties from Oxford University Press; investigator-initiated educational grants from Sunovion, Janssen Canada and Otsuka Canada; and travel support from Boehringer Ingelheim and Magstim Limited, outside the submitted work. No other competing interests were declared.
Funding: L. Palaniyappan’s work is supported by the Canadian Institute of Health Research (foundation grant 375104), Bucke Family Fund and the Tanna Schulich Endowment Chair. G. Venkatasubramanian’s work is supported by the Department of Biotechnology (DBT) — Wellcome Trust India Alliance (IA/CRC/19/1/610005) and the Department of Biotechnology, Government of India (BT/HRD-NBA-NWB/38/2019-20(6)).
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is noncommercial (i.e., research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/licenses/by-nc-nd/4.0/