Uncategorized

Using video and large language models, Google is developing new techniques for training robots



The year 2024 will be monumental for the intersection of generative AI/large fundamental models and robotics. There is a lot of buzz about the possibilities for numerous applications, from learning to product design. DeepMind Robotics researchers at Google are one of several groups investigating the space’s possibilities. In a blog post today, the team highlights current research aimed at giving robotics a more excellent knowledge of what we humans expect from them.

Traditionally, robots have focused on performing a single task repeatedly throughout their lives. Single-purpose robots are highly good at what they do, but even they have problems when changes or errors are brought into the process unintentionally.

The recently launched AutoRT is intended to harness huge fundamental models for various purposes. In a typical DeepMind scenario, the system begins by employing a Visual Language Model (VLM) for improved situational awareness. AutoRT is capable of operating a fleet of robots that work together and are equipped with cameras to map their environment and the objects within it.

Meanwhile, a big language model recommends jobs that the hardware, including its end effector, can perform. Many people believe that LLMs are the key to unlocking robotics that can interpret more natural language commands, decreasing the need for hard coding skills.

Over the last seven months, the system has been thoroughly tested. AutoRT can control up to 20 robots at once and a total of 52 distinct devices. DeepMind has accumulated over 77,000 trials, encompassing over 6,000 problems.

RT-Trajectory, which uses video input for robotic learning, is another new team member. Many companies are looking at using YouTube videos to train robots at scale, but RT-Trajectory adds an exciting element by superimposing a two-dimensional drawing of the arm in action over the video.

According to the research team, “these trajectories, in the form of RGB images, provide low-level, practical visual hints to the model as it learns its robot-control policies.”

DeepMind claims that the training has twice the success rate of its RT-2 training, at 63% versus 29% while testing 41 tasks.

“RT-Trajectory makes use of the rich robotic-motion information that is present in all robot datasets, but currently under-utilized,” the researchers write in their paper. “RT-Trajectory not only represents another step along the road to building robots able to move efficiently in novel situations but also unlocking knowledge from existing datasets.”

 



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *