This book is devoted to the hot topics of large language models (LLMs) and generative artificial intelligence (AI). It can be considered as a guidebook that introduces readers to the realm of natural language processing (NLP) by exploiting open-source and closed-source (partly proprietary) systems. The program code found in the book can be downloaded from GitHub.
Chapter 1 gives an overview of the basic concepts related to LLMs. The chapter describes popular and available LLMs. It discusses the concept of domain-specific LLMs and potential application areas. Chapter 2 looks at semantic search and the basic notions that occur in this domain. The author analyzes the concept of token in the context of NLP, the concept of embedding, and the distance metric between words/concepts. The chapter ends with an investigation of cost factors, especially in cases of closed models in cloud computing.
Chapter 3 looks at the fashionable topic of prompt engineering with GPT-3. The technique is presented through a case study: how to build a question/answering bot with ChatGPT. Chapter 4 focuses on optimization procedures in LLMs. It analyzes transfer learning and fine-tuning, and shows how to create a sample application.
Chapter 5 addresses potential adversary attacks to the developed model. A prompt injection attack against the model could create very uncomfortable results, for example, theft of proprietary information, misinformation, or biased text. The chapter also considers few-shot prompting, including how to use it and for what purpose.
Chapter 6 deals with tailoring the embedding procedure and LLM architecture to be developed, and chapter 7 goes beyond the basics and demonstrates a complex system that combines image and text processing. The system is presented in the form of a case study, with an application of reinforcement learning that extends the original model.
Chapter 8 concentrates on the exploitation of open-source models to yield results that compete with closed-source systems. A case study on classifying anime genres showcases the interrelated techniques for model and data preparation and fine-tuning. Another use case is on LaTeX code generation using GPT-2 through a small-scale model. The chapter demonstrates that open-source models with relatively low parameter numbers can give superb results.
Chapter 9 reviews the possible implementation and deployment of the developed model in cloud computing. The technical structure and relevant costs are considered; however, the man-month of the development is ignored. This part is very useful for a consultancy, that is, the “test before invest” approach.
The book is a useful introduction to the application opportunities of LLMs, the required technologies and architectures, and the available options to access either open-source or closed-source models. Graduate students can use it in software development labs to create a prototype for studying this exciting, novel technology in various application areas.
More reviews about this item: Amazon