Smart Glasses Aid Robot Learning

General purpose robots are difficult to train. The dream is to have robot that looks like Rosie from Jetsons it might execution of a series family tasks such as cleaning or folding laundry. But for this to happen, the robot must learn from large amount of data that correspond to real-world conditions—such data may be difficult to collect. Currently, most training data is collected from several static cameras which must be carefully configured to collect useful information. But what if bots could learn from the everyday interactions we already have with the physical world?

This is the question that Laboratory of Universal Robotics and Artificial Intelligence at New York University under the supervision of Associate Professor Lerrell Pintohopes to answer EgoZerosmart glasses system that AIDS robot training collecting data using an enhanced version Meta glasses.

IN recent preprintIn a proof-of-concept experiment for the new approach, the researchers trained a robot to perform seven manipulation tasks, such as picking up a piece of bread and placing it on a nearby plate. For each task, they collected 20 minutes of data from people performing those tasks, recording their actions using Meta glasses. Project Aria. (These sensor glasses are used exclusively for research purposes.) When they were then deployed to perform these tasks autonomously with a robot, the system achieved a 70 percent success rate.

Advantage of egocentric data

The “ego” part of EgoZero refers to the “egocentric” nature of the data, meaning it is collected from the perspective of the person performing the task. “The camera kind of moves with you,” just like our eyes move with us, says Raunak Bhirangipostdoctoral researcher New York University laboratory

This has two main advantages: first, the setup is more portable than external cameras. Second, glasses are more likely to capture relevant information because wearers will be confident that they—and therefore the camera—can see what is needed to complete the task. “For example, let's say I have something caught under my desk and I want to unhook it. I would lean down, look at that hook, and then unhook it, unlike a third-person camera, which is not active,” says Bhirangi. “From this egocentric perspective, you get this information embedded in your data for free.”

The second half of the name EgoZero refers to the fact that the system learns without any robot data, which can be expensive and difficult to collect; Human input alone is enough for a robot to master a new task. This was made possible by a system developed by Pinto's lab that tracks points in space rather than complete images. When training robots from images, “the discrepancy between what human hands look like and what they look like is too great.” robotic arms “It looks like it,” says Bhirangi. Instead, this system tracks points on the hand that are matched to points on the robot.

The EgoZero system collects data from people wearing smart glasses and turns it into useful 3D navigation data that allows robots to perform common manipulation tasks.Vincent Liu, Ademi Adeniji, Haotian Zhang and others.

Reducing the image to points in 3D space means that the model can track motion in the same way, regardless of the specific robotic appendage. “As long as the robot's points move relative to the object in the same way that a person's points move, everything is fine,” says Bhirangi.

All this results in a generalizable model that would otherwise require a lot of diverse robot data to train. If a robot has been trained on data about picking up one piece of bread (say, a bun), it can generalize that information to pick up a piece of ciabatta in a new environment.

Scalable solution

In addition to EgoZero, the research team is working on several projects that will help make general-purpose robots a reality, including open-source robot designs, flexible touch sensorsand additional methods for collecting real-world training data.

For example, as an alternative to EgoZero, the researchers also developed a setup with a 3D printed laptop computer. capture it's more like the “arms” of most robots. A smartphone attached to the gripper captures video using the same point-space method used in EgoZero. By getting people to collect data without bringing the robot into their homes, the team suggests two approaches that could be more scalable for collecting training data.

This scalability is ultimately the goal of the researcher. Large language models you can use all Internetbut there is no equivalent for the physical world on the Internet. Using smart glasses in everyday life can help fill this gap.

Articles from your site

Related articles on the Internet