Uncategorized

A functional contextual, observer-centric, quantum mechanical, and neuro-symbolic approach to solving the alignment problem of artificial general intelligence: safe AI through intersecting computational psychological neuroscience and LLM architecture for emergent theory of mind




doi: 10.3389/fncom.2024.1395901.


eCollection 2024.

Affiliations

Item in Clipboard

Darren J Edwards.


Front Comput Neurosci.


.

Abstract

There have been impressive advancements in the field of natural language processing (NLP) in recent years, largely driven by innovations in the development of transformer-based large language models (LLM) that utilize “attention.” This approach employs masked self-attention to establish (via similarly) different positions of tokens (words) within an inputted sequence of tokens to compute the most appropriate response based on its training corpus. However, there is speculation as to whether this approach alone can be scaled up to develop emergent artificial general intelligence (AGI), and whether it can address the alignment of AGI values with human values (called the alignment problem). Some researchers exploring the alignment problem highlight three aspects that AGI (or AI) requires to help resolve this problem: (1) an interpretable values specification; (2) a utility function; and (3) a dynamic contextual account of behavior. Here, a neurosymbolic model is proposed to help resolve these issues of human value alignment in AI, which expands on the transformer-based model for NLP to incorporate symbolic reasoning that may allow AGI to incorporate perspective-taking reasoning (i.e., resolving the need for a dynamic contextual account of behavior through deictics) as defined by a multilevel evolutionary and neurobiological framework into a functional contextual post-Skinnerian model of human language called “Neurobiological and Natural Selection Relational Frame Theory” (N-Frame). It is argued that this approach may also help establish a comprehensible value scheme, a utility function by expanding the expected utility equation of behavioral economics to consider functional contextualism, and even an observer (or witness) centric model for consciousness. Evolution theory, subjective quantum mechanics, and neuroscience are further aimed to help explain consciousness, and possible implementation within an LLM through correspondence to an interface as suggested by N-Frame. This argument is supported by the computational level of hypergraphs, relational density clusters, a conscious quantum level defined by QBism, and real-world applied level (human user feedback). It is argued that this approach could enable AI to achieve consciousness and develop deictic perspective-taking abilities, thereby attaining human-level self-awareness, empathy, and compassion toward others. Importantly, this consciousness hypothesis can be directly tested with a significance of approximately 5-sigma significance (with a 1 in 3.5 million probability that any identified AI-conscious observations in the form of a collapsed wave form are due to chance factors) through double-slit intent-type experimentation and visualization procedures for derived perspective-taking relational frames. Ultimately, this could provide a solution to the alignment problem and contribute to the emergence of a theory of mind (ToM) within AI.


Keywords:

QBism; consciousness; double slit experiment; functional contextualism; hypergraph; large language model; predictive coding.

PubMed Disclaimer

Conflict of interest statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures


Figure 1



Figure 1

(A) An illustration of word embeddings; and (B) a simplified representation of panel (A) used in Figures 3–5.


Figure 2



Figure 2

(A) An illustration of unique to LLMs positional encoding for the inputted word “is” using sine and cosine waves. Panel (B) illustrates that the word embedding values plus the position values give a unique positional encoding for input words such as “is.” Note, this process would be repeated for each input word giving a unique positional encoding for each input word.


Figure 3



Figure 3

An illustration of how LLM’s use masked self-attention via dot product to calculate the similarity of Query, Value, and Key vectors within the multihead attention layer.


Figure 4



Figure 4

A simplified illustration of a transformer-based (decoder only) LLM model, highlighting the residual connection between the input layer directly to the masked self-attention values, which are connected to a feed forward neural network to create values for the final verbal text output via a SoftMax function.


Figure 5



Figure 5

A simplified summarized illustration of a transformer-based (decoder only) LLM model, highlighting the stages of word embeddings, positional encodings, masked self-attention, residual connections, and feedforward output network.


Figure 6



Figure 6

(A) Sample Python code for a derived relation “greater than.” (B) A simple visualization of this Python code for a derived relation “greater than” using matplotlib.


Figure 7



Figure 7

A RFT (or N-Frame) and ACT values modified version of the decoder only transformer LLM, which now includes a policy network (agent), an ACT-based values estimation, a utility estimation based form the ACT-based values, and a perspective-taking unit within a neurosymbolic layer to guide token selection toward contextually relevant prosocial human values that should encourage compassionate deictic perspective-taking responding.


Figure 8



Figure 8

A simple schematic illustration of how perspective-taking ToM (“I see”) involves the combination of several relational frames to build a hierarchical perspective-taking event of another person.


Figure 9



Figure 9

Illustrative process-based hypergraph of perspective-taking relational frames for the theory of mind “I see you” function. Red coordination, green hierarchy, purple temporal, orange spatial, dash purple spatial–temporal connection, dashed red transformation of function, and brown dashed new perspective-taking relations.


Figure 10



Figure 10

Clustered graph with perspective-taking relational frames (DBSCAN clustering) hypergraph; two clusters, blue and orange.


Figure 11



Figure 11

(A) Clustered hypergraph with perspective relational frames (DBSCAN, mass represented by node size). Cluster 0 (Person A): Density = 0.60, Volume = 52, Mass = 31.20: (Person B) Density = 1.33, Volume = 5, Mass = 6.67; Cluster 1; (B) Replicator equation simulation of N-Frame, whereby more densely populated clusters (higher density) become evolutionary dominant over time (i.e., person A and B perspective-taking in blue clustering and not person C who has minimal perspective-taking as clustered in orange.


Figure 12



Figure 12

An illustration that the brain generates a “map” as defined by predictive coding and evolutionary theory. This represents the reality that we see for our internal observer perspective
CintO
, that is not necessarily homomorphic to an underlying reality that actually exists within the external world (the “territory”). Note adobe stock images from users (left, territory) idspopd, (top, map) royyimzy, (center left, superposition wave) Liubov, (center, eye) Anastasiia Lavrentev, (right, brain) jolygon, with permission.


Figure 13



Figure 13

(A1–E) Impossible structures (or objects) based on continuous self-referential loop paradoxes (internal only observer), whereby the internal observer can get caught in paradoxes that have no beginning or end.
ΨΦCintOP
. (A2) The Escher stairs again but this time demonstrating that external objects are not just related to the external properties of objects being observed in the physical world, but also related to the internal states and behaviors of the observer (observer-centric
CintO
). (F) Wheeler’s It from Bit, the participatory universe (cosmological evolution) self-reference; (G) spacetime expressed as an observer coordinate; (H) the collapse of the waveform from
CintO
; (I)

CintO
observing another
CintO
or itself self-referentially. Note that Adobe stock images (A1–E) from user Elena with permission.


Figure 14



Figure 14

An updated illustration of Penrose’s theory of the three worlds (like three sides of a three sided coin), the interface comprises of a triaspect monism, which highlights the circular relation of the platonic world
ΨΦ
, the physical world
P
, and the mental world
CintO
which gives a deeply interconnected (equivalence)account for a conscious epistemic observer-centric (participatory) ontological realism
ΨΦCintOP
.


Figure 15



Figure 15

(A) Plato’s cave, whereby the external observer projects a showdown onto a wall so that the internal observer can only observe the projection (the map) and not the source information (the territory). (B) Metaphorically how two separate people can interface (through evolution theory) with the world in different ways, on the left the woman observes a world that is bleak and without a clear path forward, while the woman on the right observes a world that is full of beauty and purpose. Adobe stock images from users (A, top) matiasdelcarmine, (B, left) Aksana, and (B, right) terra.incognita, with permission.


Figure 16



Figure 16

(A) A simple schematic representation of a Markov blanket containing sensory, internal, and active. (B) The Markov blanket of a cell whereby states can be thought of as a series of sets with a clear Markov boundary between internal (inner) states and external (outer) states. (C) The Markov blanket ensemble dynamics of internal, sensory, active, and external states of the brain and its environment.


Figure 17



Figure 17

(A) Special relativity light cone represented from the perspective of the conscious subjective observer
CintO
consistent with QBism and observer-centric perspectives such as N-Frame where time and space can be represented as planes (or dimensions) of the conscious observer
CintO
phenological experience, and equivalent to the physical dimensions that we perceive. (B) An example of two observer-centric inertial frames of reference
CintO1
and
CintO2
as depicted in special relativity. (C) The deictic axis dimensions of RFT perspective-taking that can be applied to AI are identical to that of special relativity when framed through an observer-centric perspective
CintO
(Edwards, 2023). Adobe stock images from users (A, light cone) udaix and (B, train) egudinka, with permission.


Figure 18



Figure 18

(A) An interference pattern observed in the classic Young double slit experiment whereby the photon evolving through the double slits behaves like a wave rather than a particle, leading to an interference pattern. (B) A modified version of the classic Young Double slit experiment whereby a photoelectric detector is placed at the entry point of the double slits, and this placement of detectors leads to the photon behaving more particle-like, leading to a two-band diffraction pattern. (C) A modified version of the classic Young Double slit experiment whereby a photoelectric detector is placed at the exit point of the double slits (usually with an interferometer set up), and this placement of detectors also leads to the photon behaving more particle-like leading to a two-band diffraction pattern despite not detecting which slit the photon traveled through, this effect changes to an interference pattern when the information is “erased.” Adobe stock images (A,B, and left part C) from user LuckySoul with permission.

References

    1. Aaronson S. (2013). The ghost in the quantum turing machine. Mind. Mach. 23, 411–442. doi: 10.48550/arXiv.1306.0159



      DOI

    1. Allday J. (2009). Quantum Reality. Theory and Philosophy. Boca Raton Fla./London/New York.

    1. Allday J. (2022). Quantum Reality: Theory and Philosophy. New York: CRC Press.

    1. Anderson S. L. (2008). Asimov’s “three laws of robotics” and machine metaethics. AI & Soc. 22, 477–493. doi: 10.1007/s00146-007-0094-5, PMID:



      DOI



      PubMed

    1. Armstrong D. M. (2018). The Mind-Body Problem: An Opinionated Introduction. Oxfordshire, UK: Routledge.

Grants and funding

The author declares that no financial support was received for the research, authorship, and/or publication of this article.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *