Google outlines new methods for training robots with video and large language models

2024: DeepMind’s Research on Robotics and Generative AI

In a blog post by Google’s DeepMind Robotics researchers, the potential of generative AI/large foundational models and robotics in 2024 is highlighted. The team is exploring the possibilities of this cross-section and the excitement it brings for various applications, such as learning and product design.

Traditionally, robots have been focused on performing a single task repeatedly throughout their lifespan. While single-purpose robots excel at their designated tasks, they face challenges when unexpected changes or errors occur.

To address this, DeepMind has introduced AutoRT, a system that utilizes large foundational models for multiple purposes. The system incorporates a Visual Language Model (VLM) to enhance situational awareness. Equipped with cameras, a fleet of robots can map their environment and identify objects within it.

Additionally, a large language model (LLM) suggests tasks that the hardware, including its end effector, can accomplish. LLMs are considered crucial in enabling robots to understand natural language commands, reducing the need for manual coding skills.

AutoRT has undergone extensive testing over the past several months. It has successfully orchestrated up to 20 robots simultaneously, managing a total of 52 different devices. DeepMind has conducted over 77,000 trials and more than 6,000 tasks using AutoRT.

Another innovation from DeepMind is RT-Trajectory, which leverages video input for robotic learning. While many teams rely on YouTube videos to train robots at scale, RT-Trajectory takes it a step further by overlaying a two-dimensional sketch of the arm’s movement on top of the video. These sketches provide practical visual cues to the model as it learns its robot-control policies.

During testing, RT-Trajectory achieved a 63% success rate in 41 tasks, doubling the success rate of its previous RT-2 training at 29%. The team emphasizes that RT-Trajectory not only improves the efficiency and accuracy of robot movements in novel situations but also unlocks valuable knowledge from existing datasets.

DeepMind’s ongoing research in robotics and generative AI is paving the way for advancements in the field. By combining large foundational models with robotics, the team aims to create robots that better understand human intentions and can adapt to changing circumstances. The potential applications of this technology are vast, ranging from improved automation in various industries to more intuitive human-robot interactions.

Source link