arstechnica.com
Agents among us OpenAI launches Operator, an AI agent that can operate your computer New research "Computer-Use Agent" AI model can jump in and help users with on-screen tasks. Benj Edwards Jan 23, 2025 5:24 pm | 0 Credit: josefkubes via Getty Images Credit: josefkubes via Getty Images Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreOn Thursday, OpenAI released a research preview of "Operator," a web automation tool that uses a new AI model called Computer-Using Agent (CUA) to control computers through a visual interface. The system performs tasks by viewing and interacting with on-screen elements like buttons and text fields similar to how a human would.Operator is available today for subscribers of the $200 per month ChatGPT Pro plan at operator.chatgpt.com. The company plans to expand to Plus, Team, and Enterprise users later. OpenAI intends to integrate these capabilities directly into ChatGPT and later release CUA through its API for developers.Operator watches on-screen content while you use your computer and executes tasks through simulated keyboard and mouse inputs. The Computer-Using Agent processes screenshots to understand the computer's state and then makes decisions about clicking, typing, and scrolling based on its observations.OpenAI's release follows other tech companies as they push into what are often called "agentic" AI systems, which can take actions on a user's behalf. Google announced Project Mariner in December 2024, which performs automated tasks through the Chrome browser, and two months earlier, in October 2024, Anthropic launched a web automation tool called "Computer Use" focused on developers that can control a user's mouse cursor and take actions on a computer."The Operator interface looks very similar to Anthropic's Claude Computer Use demo from October," wrote AI researcher Simon Willison on his blog, "even down to the interface with a chat panel on the left and a visible interface being interacted with on the right." An Operator demo video created by OpenAI. Watch and take actionTo use your PC like you would, the Computer-Using Agent works in multiple steps. First, it captures screenshots to monitor your screen, then analyzes those images (using GPT-4o's vision capabilities with additional reinforcement learning) to process raw pixel data. Next, it determines what actions to take and then performs virtual inputs to control the computer. This iterative loop design reportedly lets the system recover from errors and handle complex tasks across different applications.While it's working, Operator shows a miniature browser window of its actions.However, the technology behind Operator is still relatively new and far from perfect. The model reportedly performs best at repetitive web tasks like creating shopping lists or playlists. It struggles more with unfamiliar interfaces like tables and calendars, and does poorly with complex text editing (with a 40 percent success rate), according to OpenAI's internal testing data.OpenAI reported the system achieved an 87 percent success rate on the WebVoyager benchmark, which tests live sites like Amazon and Google Maps. On WebArena, which uses offline test sites for training autonomous agents, Operator's success rate dropped to 58.1 percent. For computer operating system tasks, CUA set an apparent record of 38.1 percent success on the OSWorld benchmark, surpassing previous models but still falling short of human performance at 72.4 percent.With this imperfect research preview, OpenAI hopes to gather user feedback and refine the system's capabilities. The company acknowledges CUA won't perform reliably in all scenarios but plans to improve its reliability across a wider range of tasks through user testing.Safety and privacy concernsFor any AI model that can see how you operate your computer and even control some aspects of it, privacy and safety are very important. OpenAI says it built multiple safety controls into Operator, requiring user confirmation before completing sensitive actions like sending emails or making purchases. Operator also has limits on what it can browse, set by OpenAI. It cannot access certain website categories, including gambling and adult content.Traditionally, AI models based on large language model-style Transformer technology like Operator have been relatively easy to fool with jailbreaks and prompt injections.To catch attempts at subverting Operator, which might hypothetically be embedded in websites that the AI model browses, OpenAI says it has implemented real-time moderation and detection systems. OpenAI reports the system recognized all but one case of prompt injection attempts during an early internal red-teaming session.However, Willison, who frequently covers AI security issues, isn't convinced Operator can stay secure, especially as new threats emerge. "Color me skeptical," he wrote in his blog post. "I imagine we'll see all kinds of novel successful prompt injection style attacks against this model once the rest of the world starts to explore it."As Willison points out, OpenAI acknowledges these risks in its System Card documentation: "Despite proactive testing and mitigation efforts, certain challenges and risks remain due to the difficulty of modeling the complexity of real-world scenarios and the dynamic nature of adversarial threats."And what about privacy? Since all the information Operator sees about what is on your screen gets sent over the Internet to OpenAI's cloud servers through periodic screenshots, you're putting a lot of trust in OpenAI.OpenAI says it has implemented several privacy controls: Users can opt out of having their data used for model training through ChatGPT settings, delete all browsing data with one click in Operator settings, and log out of all sites simultaneously. When users need to input sensitive information like passwords or payment details, a "takeover mode" activates where Operator stops collecting screenshots.Even with these precautions, Willison provided his own Operator privacy advice on his blog: "Start a fresh session for each task you outsource to Operator to ensure it doesn't have access to your credentials for any sites that you have used via the tool in the past. If you're having it spend money on your behalf, let it get to the checkout, then provide it with your payment details and wipe the session straight afterwards."Benj EdwardsSenior AI ReporterBenj EdwardsSenior AI Reporter Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. 0 Comments