A year ago Sam AltmanCEO of OpenAI, made a bold prediction: “We believe that in 2025, we could see the first AI agents ‘joining the workforce’ and significantly changing the performance of companies.” A couple of weeks later, the company's chief product officer Kevin Weil said at the World Economic Forum conference in Davos in January: “I think 2025 is the year we move from ChatGPT, which is a super smart thing… to ChatGPT, which does things for you in the real world.” He cited examples of using artificial intelligence in filling out online forms and making restaurant reservations. He later promised: “Without a doubt, we can do it.” (OpenAI has a corporate partnership with Condé Nast, owner of New Yorker.)
This was no small boast. Chatbots can respond directly to text queries, for example by answering a question or writing a draft text. e-mail. But in theory, an agent would be able to navigate the digital world on its own and perform tasks that require multiple steps and the use of other software, such as web browsers. Consider everything you need to do to book a hotel: choosing the right nights, filtering by preferences, reading reviews, searching various websites to compare prices and amenities. An agent could automate all of these actions. The consequences of such technology will be enormous. Chatbots are convenient for employees to use; effective AI agents can completely replace employees. Salesforce CEO Marc Benioff, who said half of his company's work is done by AI, predicted that agents would help usher in a trillion-dollar “digital work revolution.”
New Yorker writers reflect on the year's highs and lows.
2025 has been declared the Year of the AI Agent in part because by the end of 2024, these tools have undoubtedly become adept at computer programming. A demo of OpenAI's Code Agent released in May showed a user asking the tool to change his personal website. “Add another tab next to Investments/Tools called “Foods I Like.” The document says “tacos,” the user wrote. The chatbot quickly performed a sequence of interrelated actions: it looked through the files in the site directory, examined the contents of the perspective file, then used the search command to find a suitable place to insert a new line of code. Once the agent learned how the site was structured, he used this information to successfully add a new page that featured tacos. As a computer scientist, I had to admit that the Code handled this task more or less as well as I would have liked. Silicon Valley has become convinced that other difficult problems will soon be solved.
However, as 2025 comes to a close, the era of general purpose AI agents has yet to arrive. This fall, Andrei Karpathy, the co-founder of OpenAI who left the company to start an artificial intelligence training project, described the agents as “cognitively retarded” and said, “It just doesn't work.” Gary Marcus, a longtime critic of the tech industry hype, recently wrote in his Substack magazine that “AI agents have so far been largely useless.” This gap between predictions and reality matters. Free-flowing chatbots and reality-altering video generators are impressive, but they alone cannot usher in a world in which machines take over many of our actions. If large AI companies fail to create widely useful agents, they may not be able to deliver on their promises of an AI-powered future.
The term “artificial intelligence agents” evokes ideas of new technologies reminiscent of The Matrix or Mission: Impossible: Final Reckoning. In reality, agents are not some tuned digital brain; instead, they are based on the same large language model that chatbots use. When you ask an agent to do some work, the management program—a simple application that coordinates the agent's actions—turns your request into an LLM request. Here's what I want to achieve, here are the tools available, what should I do first? The control program then tries to perform any actions suggested by the language model, reports the result to it, and asks: What should I do?? This cycle continues until the LLM deems the task complete.
This setup is excellent for automating software development. Most of the steps required to create or modify a computer program can be accomplished by entering a limited set of commands into a text terminal. These commands tell the computer to navigate the file system, add or update text in source files, and optionally compile human-readable code into machine-readable bits. This is an ideal environment for LLM students. “The terminal interface is text-based, and that's the area on which language models are based,” Alex Shaw, co-author of Terminal-Bench, a popular tool used to evaluate coding agents, told me.
More versatile assistants, like the ones Altman proposes, will require agents to leave the comfortable confines of a terminal. Since most of us perform computer tasks by pointing and clicking with a mouse, an AI that can “join the workforce” would likely need to be able to use a mouse, a surprisingly difficult goal. Time recently reported about a spate of new startups that are creating “dark sites”—copies of popular web pages, such as those for United Airlines and Gmail, where AI can analyze how people use the cursor. In July, OpenAI released ChatGPT Agent, an early version of a bot that can use a web browser to perform tasks, but one review noted that “even simple actions such as clicking, selecting items, and searching can take the agent several seconds or even minutes.” At one point, the tool froze for almost a quarter of an hour while trying to select a price from a real estate website's drop-down menu.






