Uncategorized

To Understand Large Language Models We Need to Go Back to the Basics


Arthur C. Clarke famously stated that “any sufficiently advanced technology is indistinguishable from magic.” Most of us have experienced this law with respect to the latest iterations of large language models (LLMs) such as GPT-4. This perspective may lead to incorrect usage of LLMs, resulting in undesirable and dangerous effects such as privacy violations, proliferation of misinformation, and blind trust in their outputs that ignores bias amplification and hallucinations. Of course, the solution to this problem is understanding the technology that underlies these language models. Our book takes a step in this direction by introducing neural networks from scratch, assuming a reader with minimal technical background.

Of course, the immediate obvious question is then: “Why do we need another deep learning and natural language processing book?” Several excellent ones have been published, covering both theoretical and practical aspects of deep learning and its application to language processing. However, from our experience teaching courses on natural language processing, we argue that, despite their excellent quality, most of these books do not target their most likely readers. The intended reader of this book is one who is skilled in a domain other than machine learning and natural language processing and whose work relies, at least partially, on the automated analysis of large amounts of data, especially textual data. Such experts may include social scientists, political scientists,  biomedical scientists, and even computer scientists and computational linguists with limited exposure to machine learning.

Existing deep learning and natural language processing books generally fall into two camps. The first camp focuses on the theoretical foundations of deep learning. This is certainly useful to the aforementioned readers, as one should understand the theoretical aspects of a tool before using it. However, these books tend to assume the typical background of a machine learning researcher and, as a consequence, I have often seen students who do not have this background rapidly get lost in such material. To mitigate this issue, the second type of book that exists today focuses on the machine learning practitioner; that is, on how to use deep learning software, with minimal attention paid to the theoretical aspects. We argue that focusing on practical aspects is similarly necessary but not sufficient. Considering that deep learning frameworks and libraries have gotten fairly complex, the chance of misusing them due to theoretical misunderstandings is high. We have commonly seen this problem in our courses, too.

This book, therefore, aims to bridge the theoretical and practical aspects of deep learning for natural language processing. We cover the necessary theoretical background and assume minimal machine learning background from the reader. Our aim is that anyone who took introductory linear algebra and calculus courses will be able to follow the theoretical material. To address practical aspects, this book includes pseudo code for the simpler algorithms discussed and actual Python code for the more complicated architectures. The code should be understandable by anyone who has taken a Python programming course. After reading this book, we expect that the reader will have the necessary foundation to immediately begin building real-world, practical natural language processing systems, and to expand their knowledge by reading research publications on these topics.

Deep Learning for Natural Language Processing by Mihai Surdeanu and Marco Antonio Valenzuela-Escárcega





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *