Recent advances in the field of Artificial Intelligence (AI) and Natural Language Processing (NLP) have led to the introduction of Large Language Models (LLMs). The significantly growing popularity of LLMs indicates that human-like talents can eventually be mirrored by robots. In recent research, a team of researchers from Kuaishou Inc. and Harbin Institute of Technology has introduced KwaiAgents, an information-seeking agent system based on LLMs.
KwaiAgents is composed of three primary parts, which are – an autonomous agent loop called KAgentSys, an open-source LLM suite called KAgentLMs, and a benchmark called KAgentBench that evaluates how well LLMs work in response to different agent-system cues. With its planning-concluding procedure, the KAgentSys integrates a hybrid search-browse toolkit to manage data from many sources efficiently.
KAgentLMs include a number of sizable language models with agent features, such as tool usage, planning, and reflection. More than 3,000 automatically graded, human-edited evaluation files created to assess Agent skills have been included in KAgentBench. Planning, using tools, reflecting, wrapping up, and profiling are all included in the evaluation dimensions.
KwaiAgents uses LLMs as its central processing unit inside this architecture. The system is capable of understanding user inquiries, following rules about behavior, referencing external documents, updating and retrieving data from internal memory, organizing and carrying out activities with the help of a time-sensitive search-browse toolset, and finally, offering thorough answers.
The team has shared that the study looks into how well the system operates with LLMs that aren’t as sophisticated as GPT-4. In order to overcome this, the Meta-Agent Tuning (MAT) architecture has also been presented, which guarantees that 7B or 13B open-source models can perform well in a variety of agent systems.
The team has carefully validated these capabilities using both human assessments and benchmark evaluations. In order to assess LLM performance, about 200 factual or time-aware inquiries have been gathered and annotated by humans. The tests have shown that KwaiAgents perform better than a number of open-sourced agent systems when they follow MAT. Even smaller models, such as 7B or 13B, have demonstrated generalized agent capabilities for tasks involving the retrieval of information from many systems.
The team has summarized their primary contributions as follows.
- KAgentSys has been introduced, which includes a special hybrid search browse and time-aware toolset together with a planning-concluding approach.
- The proposed system has shown improved performance compared to current open-source agent systems.
- With the introduction of KAgentLMs, the possibility of obtaining generalized agent capabilities for information-seeking tasks through smaller, open-sourced LLMs has been explored.
- The Meta-Agent Tuning framework has been introduced to guarantee effective performance, even with less sophisticated LLMs.
- KAgentBench, a freely available benchmark that makes it easier for humans and computers to evaluate different agent system capabilities, has also been developed.
- A thorough assessment of the performance of agent systems using both automated and human-centered methods has been conducted.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, LinkedIn Group, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.