Google Researchers Unveil ChatGPT-Style AI Model To Guide a Robot Without Special Training
Wednesday March 8, 2023. 04:30 AM , from Slashdot
PaLM-E does this by analyzing data from the robot's camera without needing a pre-processed scene representation. This eliminates the need for a human to pre-process or annotate the data and allows for more autonomous robotic control. It's also resilient and can react to its environment. For example, the PaLM-E model can guide a robot to get a chip bag from a kitchen -- and with PaLM-E integrated into the control loop, it becomes resistant to interruptions that might occur during the task. In a video example, a researcher grabs the chips from the robot and moves them, but the robot locates the chips and grabs them again. In another example, the same PaLM-E model autonomously controls a robot through tasks with complex sequences that previously required human guidance. Google's research paper explains (PDF) how PaLM-E turns instructions into actions.
PaLM-E is a next-token predictor, and it's called 'PaLM-E' because it's based on Google's existing large language model (LLM) called 'PaLM' (which is similar to the technology behind ChatGPT). Google has made PaLM 'embodied' by adding sensory information and robotic control. Since it's based on a language model, PaLM-E takes continuous observations, like images or sensor data, and encodes them into a sequence of vectors that are the same size as language tokens. This allows the model to 'understand' the sensory information in the same way it processes language. In addition to the RT-1 robotics transformer, PaLM-E draws from Google's previous work on ViT-22B, a vision transformer model revealed in February. ViT-22B has been trained on various visual tasks, such as image classification, object detection, semantic segmentation, and image captioning.
Read more of this story at Slashdot.
Apr, Sun 2 - 07:05 CEST