Uncategorized

Foundation Models in a… – Towards AI


Author(s): Alberto Paderno

Originally published on Towards AI.

Psychopathology of Large Language Models: Foundation Models in a Neurobiological Perspective

Optimal Brain Damage, Synaptic Pruning, and the Problem of “Hallucinations”

Artificial psychosurgery, modifying the architecture to improve the function — Image generated by Dall-E 3

The performance of large language models (LLMs) is growing at a breakneck pace. Models provide more coherent and consistent answers, with a progressive and significant reduction of hallucinations. This improvement is mainly related to the overall optimization of model architecture and training data, as well as the constant increase in parameters. Nonetheless, hallucinations are still occurring, sometimes in unexpected ways, and it is still a significant challenge to trace the source of these anomalies. Our understanding of the inner workings of LLMs is less detailed than what one could assume when looking at their widespread application, and this is a considerable limit in specific domains where random and unexpected errors could lead to severe consequences (e.g., healthcare and finance).

Neurodevelopmental Correlates in Artificial Intelligence

Understanding the neural development process in humans can be a useful guide to design and optimize LLMs. The human brain, especially during its development, undergoes various processes that enhance its efficiency and functionality, ensuring that neural circuits are adapted based on environmental interactions. During fetal development, the brain undergoes rapid growth, with an overproduction of synaptic connections. Subsequently, as the individual matures, synaptic pruning refines this neural network by removing redundant connections, thereby enhancing the groundedness and efficiency of neural processes. Neuronal activity plays a key role in this process – synapses that are frequently used and activated are strengthened and preserved, while those that are seldom used are pruned away.

Evolution has selected for this approach in network formation: construction through overabundance and subsequent pruning.

These neural networks are much more robust and efficient than networks that are constructed through other means (1).

Neuropathology of Biological Neural Networks

From the pathological viewpoint, alterations in synaptic pruning are one of the proposed etiological mechanisms behind neurological and psychiatric disorders (2). On one side, over-pruning can contribute to the excessive loss of functional synapses, such as in Alzheimer’s disease. On the other, unbalanced pruning is one of the potential causes of disorders such as autism and schizophrenia, where an inefficient “fine-tuning” of synaptic connections might lead to characteristic disease presentations. A dysregulated pruning process may be the source of symptoms such as hallucinations, delusions, and disorganized thinking in schizophrenia, as well as sensory processing challenges and behavioral profiles typical of autism.

Image generated by Dall-E 3

Neuropathology of LLMs

In LLMs, ‘hallucinations’ typically refer to the generation of nonsensical or nonfactual content that is not grounded in the source material. However, Smith et al. (3) proposed the term ‘confabulation’ as a more fitting descriptor. In contrast to hallucination, confabulation involves generating incorrect narrative details, influenced by the model’s existing knowledge and contextual understanding, rather than implying any form of sensory perception. This redefinition aligns more closely with the operation of LLMs, which synthesize outputs based on patterns learned from vast datasets rather than experiencing sensory inputs.

In general, the extensive training of LLMs mirrors the initial stages of brain development, where a vast array of neural connections are formed. However, like the human brain, LLMs may require an adjunctive refinement process. This refinement, analogous to synaptic pruning in human development, would involve cleaning and optimizing the model’s architecture. Without this process, an LLM may risk being overwhelmed by ‘white noise’ — excess information or connections that obscure or distort the intended output.

So, the continual improvement of LLMs through methods like pruning may be required to ensure that their outputs are relevant, accurate, and grounded in the source material. As discussed above, these characteristics are particularly relevant when operating in fields where the reliability of the information provided by the LLM is critical for safety reasons. In fact, Elaraby et al. (4) showed that, to date, LLM-generated summaries in the legal or health space sometimes contain inaccurate information with the potential for a real-life negative impact.

A Technical Perspective on LLM Pruning — “Optimal Brain Damage”

As already described in technical papers, LLM pruning involves the reduction of model size by eliminating weights that contribute minimally to model performance, thus generating a sparser model. This process leads to more efficient models that require less computational power and resources while maintaining or even enhancing performance.

Pruning in LLMs is strikingly similar to synaptic pruning in neurodevelopment. Just as synaptic pruning optimizes neural pathways by removing redundant connections, model pruning in LLMs aims to maintain or enhance performance by removing redundant weights​​.

A fascinating description of the potential impact of model pruning was provided as early as 1989 by LeCun et al. (5) in a paper titled “Optimal Brain Damage.” As stated by the authors, by removing unimportant weights from a network, several improvements can be expected: better generalization, fewer required training examples, and improved speed of learning and classification. However, rather than brain damage, this can be viewed as a physiological step in neural structure optimization, a tailored “psychosurgical” approach aimed at favoring the adequate “maturation” of the architecture.

Indeed, it is extremely interesting to notice that, as recently demonstrated by Chrysostomou et al. (6), pruned LLMs tend to hallucinate less than their full-sized counterparts, potentially due to a greater reliance on source input rather than parametric knowledge from pre-training​​.

The Bigger, the Better?

The absence of adequate pruning might be one of the components underlying the presence of hallucinations in LLMs, and improvements in this field may lead to better models without the need for a significant increase in size, challenging the “the bigger, the better” stereotype. However, like synaptic pruning, AI model pruning is a balancing act of removing excess while preserving essential functionalities. The convergence of these biological and computational processes shows a parallel in seeking optimized efficiency and functionality in complex systems.

References

1. Navlakha S, Barth AL, Bar-Joseph Z. Decreasing-Rate Pruning Optimizes the Construction of Efficient and Robust Distributed Networks. Graham LJ, ed. PLoS Comput Biol. 2015;11(7):e1004347. doi:10.1371/journal.pcbi.1004347

2. Xie C, Xiang S, Shen C, et al. A shared neural basis underlying psychiatric comorbidity. Nat Med. 2023;29(5):1232–1242. doi:10.1038/s41591–023–02317–4

3. Smith AL, Greaves F, Panch T. Hallucination or Confabulation? Neuroanatomy as Metaphor in Large Language Models. Berkovsky S, ed. PLOS Digit Health. 2023;2(11):e0000388. doi:10.1371/journal.pdig.0000388

4. Elaraby M, Zhong Y, Litman D. Towards Argument-Aware Abstractive Summarization of Long Legal Opinions with Summary Reranking.

5. LeCun Y, Denker JS, Solla SA. Optimal Brain Damage.

6. Chrysostomou G, Zhao Z, Williams M, Aletras N. Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. Published online November 15, 2023. Accessed November 27, 2023. http://arxiv.org/abs/2311.09335

Published via Towards AI



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *