CHENNAI, India — Now that artificial intelligence has mastered almost everything we do online, it needs help learning how we physically move in the real world.
A growing global army of trainers is helping it get away from our computers and into our living rooms, offices and factories, teaching it how we move.
In an industrial city in southern India, 28-year-old Naveen Kumar stands at his desk and begins his day's work: folding hand towels hundreds of times, as precisely as possible.
He doesn't work at the hotel; he works for a startup that creates physical data used to train AI.
A robot trains for the 100-meter dash ahead of the opening ceremony of the World Humanoid Robot Games in Beijing in August.
(An Guan/Associated Press)
He attaches a GoPro camera to his forehead and performs a regimented list of hand movements to accurately capture how a person bends.
That day, he had to take each towel from the basket on the right side of the table using only his right hand, shake the towel with both hands, and then fold it carefully three times. He then had to place each folded towel in the left corner of the table.
If it takes more than a minute or he misses any steps, he will have to start over.
His data labeling firm Objectways sent a client in the US 200 videos of towel folding. The company employs more than 2,000 employees; about half of them label sensor data from autonomous cars and robotics, while the rest work on generative artificial intelligence.
Most of them are engineers and only a few know how to fold towels, so they take turns doing the physical work.
“Sometimes we have to remove around 150 or 200 videos because of stupid mistakes in the way we fold or place things,” says Kumar, an engineering graduate who has worked at Objectways for six years.
Carefully choreographed movements are designed to convey all the nuances of what people do—hands reaching, fingers curling, fabric sliding—as they fold clothes.
The captured videos are then annotated by Kumar and his team. They draw boxes around different parts of the video, mark towels and note whether the hand moved left or right, and classify each gesture.
Kumar and his colleagues in Karur, about 300 miles south of Bangalore, are unlikely to mentor the next generation of artificially intelligent robots.
“Companies are building fundamental models that are suitable for the physical world,” said Ulrik Stig Hansen, co-founder of Encord, a San Francisco-based data management platform that contracts with Objectways to collect human demonstration data. “There is a huge resurgence in robotics.”
Encord partners with robotics companies like Jeff Bezos-backed Physical Intelligence and Dyna Robotics.
Tesla, Boston Dynamics and Nvidia are among the U.S. leaders in the race to develop the next generation of robots. Tesla already uses its Optimus robots, which appear to be often remote-controlled, for various corporate events. Google has its own artificial intelligence models for robotics. OpenAI is ramping up its robotics ambitions.
Nvidia projects The humanoid robot market could reach $38 billion over the next decade.
There are also many lesser-known companies trying to provide the hardware, software and data to make a mass-produced multi-tasking humanoid robot a reality.
The robots are on display at Nvidia's booth during the China International Supply Chain Expo in Beijing in July.
(Mahesh Kumar A./Associated Press)
Large language models that power chatbots, such as ChatGPT, have mastered the use of language, images, music, coding, and other skills by collecting everything online. They use the entire Internet to figure out how everything is connected and to imitate the way we do things, such as answering questions and creating photorealistic videos.
Data about how the physical world works (like how much force it takes to fold a napkin) is harder to obtain and translate into something that AI can use.
As robotics improves and combines with artificial intelligence that knows how to move around in the physical world, it could lead to more robots in workplaces and homes. While many fear that this could lead to job losses and unemployment, optimists believe that improved robots will free people from tedious work, reduce labor costs and ultimately give people more time to relax or focus on more interesting and important work.
Many companies have entered the fray as shovel sellers in the AI gold rush, seeing an opportunity to collect data for what is called physical AI.
One group of companies is teaching AI how to operate in the real world by having people control robots remotely.
Ali Ansari, founder of San Francisco-based Micro1, said data collection in new robotics is increasingly focused on teleoperations. People with controllers make the robot do something, such as pick up a cup or make tea. The AI receives videos of successful and unsuccessful attempts to do something, and it learns how to do it.
Remote control training can take place in the same room as the robots, or with a controller in another country. Encord's Hansen said there are plans to build warehouses in Eastern Europe where large groups of operators will sit with joysticks to control robots around the world.
There are more of them, called “weapon farms,” according to Mohammed Musa, founder of Deepen AI, a data annotation company headquartered in California. They appear as demand grows.
“Today there is a mixture of real and synthetic data collected from human demonstrations, teleoperation sessions and staged environments,” he said. “Much of this work is still done outside the West, but automation and modeling are reducing this dependence over time.”
Some have criticized tele-controlled humanoids for being more fizzy than substance. They can be impressive when supervised by others, but are still far from completely autonomous.
Ansari's Micro1 also performs a function called human data collection. People benefit from wearing smart glasses that record their daily activities. This occurs in Brazil, Argentina, India and the USA.
San Jose-based Fig AI partnered with real estate giant Brookfield to capture footage from the inside of 100,000 homes. It will be collect data about human movement to teach humanoid robots to move in human spaces. The company stated this I'll spend it The company has raised most of its $1 billion to collect first-person data about people.
Meta-backed company Scale AI has amassed 100,000 hours of similar robotics training materials in its San Francisco-based prototype lab.
However, training bots is not always easy.
Twenty-year-old Dev Mandal started a company in Bangalore, hoping to capitalize on the need for physical data to train AI. He proposed low-cost Indian labor for motion tracking. After advertising his services, he received a request to help train a robotic arm to prepare food, as well as a robot to connect and disconnect cables in data centers.
But he had to give up the business because potential clients required physical movement data collected in a very specific way, making it difficult for him to make money, even using inexpensive labor in India. Clients wanted, for example, to use a specific robotic arm using a specific type of table with a purple light.
“They had to specify everything down to the color of the table,” he said. “And they said it had to be that color.”
However, Karura towel folders still have a lot of work to do.
Their boss, Objectways founder Ravi Shankar, says that in recent months his firm has been filming and narrating footage of robotic arms stacking cardboard boxes and T-shirts and selecting specific colored items on a table.
The company recently began commenting on videos of more advanced humanoid robots, helping them learn how to sort and fold towels and clothes, fold them and place them in different corners of the table. His team had to annotate 15,000 videos of robots doing their jobs.
“Sometimes the robot's arms throw out clothes and don't fold properly. Sometimes it throws a stack of things around,” but robots learn quickly, says Kavin, 27, an Objectways employee who goes by one name. “In five to 10 years, they will be able to do all the work and we will have nothing left.”






