The Global Project To Make a General Robotic Brain

Monday January 15, 2024. 02:27 AM , from Slashdot

Generative AI 'doesn't easily carry over into robotics,' write two researchers in IEEE Spectrum, 'because the Internet is not full of robotic-interaction data in the same way that it's full of text and images.'

That's why they're working on a single deep neural network capable of piloting many different types of robots...

Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks... The most impressive results typically only work in a single laboratory, on a single robot, and often involve only a handful of behaviors... [W]hat if we were to pool together the experiences of many robots, so a new robot could learn from all of them at once? We decided to give it a try. In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality...

The question is whether a deep neural network trained on data from a sufficiently large number of different robots can learn to 'drive' all of them — even robots with very different appearances, physical properties, and capabilities. If so, this approach could potentially unlock the power of large datasets for robotic learning. The scale of this project is very large because it has to be. The RT-X dataset currently contains nearly a million robotic trials for 22 types of robots, including many of the most commonly used robotic arms on the market...

Surprisingly, we found that our multirobot data could be used with relatively simple machine-learning methods, provided that we follow the recipe of using large neural-network models with large datasets. Leveraging the same kinds of models used in current LLMs like ChatGPT, we were able to train robot-control algorithms that do not require any special features for cross-embodiment. Much like a person can drive a car or ride a bicycle using the same brain, a model trained on the RT-X dataset can simply recognize what kind of robot it's controlling from what it sees in the robot's own camera observations. If the robot's camera sees a UR10 industrial arm, the model sends commands appropriate to a UR10. If the model instead sees a low-cost WidowX hobbyist arm, the model moves it accordingly.

'To test the capabilities of our model, five of the laboratories involved in the RT-X collaboration each tested it in a head-to-head comparison against the best control system they had developed independently for their own robot... Remarkably, the single unified model provided improved performance over each laboratory's own best method, succeeding at the tasks about 50 percent more often on average.' And they then used a pre-existing vision-language model to successfully add the ability to output robot actions in response to image-based prompts.

'The RT-X project shows what is possible when the robot-learning community acts together... and we hope that RT-X will grow into a collaborative effort to develop data standards, reusable models, and new techniques and algorithms.'

Thanks to long-time Slashdot reader Futurepower(R) for sharing the article.

Read more of this story at Slashdot.