Navigation
Search
|
OpenAI debuts AI agent Operator to transform web task automation
Friday January 24, 2025. 10:30 AM , from ComputerWorld
OpenAI has unveiled “Operator,” a new AI agent designed to perform web-based tasks, offering potential productivity enhancements for enterprises.
The tool enables interaction with on-screen elements, positioning it as a solution for automating routine processes in business workflows amid growing competition in the generative AI space. “Powering Operator is Computer-Using Agent (CUA), a model that combines GPT-4o’s vision capabilities with advanced reasoning through reinforcement learning,” OpenAI said in a blog post. “CUA is trained to interact with graphical user interfaces (GUIs) – the buttons, menus, and text fields people see on a screen – just as humans do. This gives it the flexibility to perform digital tasks without using OS- or web-specific APIs.” CUA leverages years of research in multimodal understanding and reasoning, combining advanced GUI perception with structured problem-solving, the company added. It can break tasks into multi-step plans and self-correct when encountering challenges, representing a significant step in AI development by enabling models to use tools commonly relied on by humans and unlocking new application possibilities. The increasing role of AI agents AI agents, designed to handle tasks such as scheduling and online transactions, are gaining interest in corporate AI initiatives. Rumors about OpenAI’s agent have been circulating for months. Perplexity introduced an Android-based assistant this week, offering features like booking reservations, ride-hailing, and reminders. Apple expanded Siri’s capabilities last year by integrating Apple Intelligence and adding ChatGPT support through a partnership with OpenAI. Last year Anthropic also introduced a feature for its AI models called “Computer Use,” enabling Claude 3.5 Sonnet to perform tasks on computers autonomously. However, analysts suggest that Operator may have some advantages over the competition. “Operator is quite advanced than Perplexity and other AI agents out there and more customizable and configurable,” said Neil Shah, partner and co-founder at Counterpoint Research. “The capability of the agent to let user ‘takeover’ when needed or confirm ‘actions’ with users or filtering out sensitive info or keeping ‘watch’ is unique and gives users more control while being autonomous.” Another differentiator is that most current agents are designed to take direct action based on user prompts and requests but not to maneuver through websites. “Open AI’s Operator is designed to be a web agent that can autonomously go through websites and conduct multi-step tasks,” said Hyoun Park, CEO and chief analyst at Amalgam Insights. “This is designed to be helpful both for accessing data within each website as well as conducting complex and time-consuming tasks that currently require repetitive clicking and typing.” Enterprise applications and accessibility potential AI agents open doors for various industries, particularly those seeking to enhance efficiency and streamline workflows. Their ability to automate tasks such as data collection and interaction with web-based platforms offers significant value for businesses. “AI agents like Operator, still in their nascent stages, have the long-term potential to revolutionize industries such as customer service, healthcare, retail, and logistics by automating repetitive tasks, personalizing interactions, and enhancing workflow efficiency,” said Prabhu Ram, VP of the industry research group at Cybermedia Research. Additionally, the tool redefines the concept of “accessibility,” making it easier for individuals who struggle to navigate or interact with the web to access online resources. “This agent could also be useful in helping employees to quickly gather information or access all of the accessible data and content on a website that would be appropriate for that user to be able to see,” Park said. “Web agents will likely be an important tool for collecting long tail information on websites that may be hidden behind multi-step workflows that are time-consuming or difficult for humans to negotiate or maneuver.” With features like custom API integration and configurability, the Operator tool could also benefit enterprises by enabling them to deploy these agents for internal purposes, such as extracting and organizing data from their own websites or intranets, Shah added. Safety concerns to overcome AI agents introduce a new wave of safety challenges, with potential risks including misuse for bypassing system safeguards. These risks can range from automating form submissions on public sector websites to launching traffic attacks that disrupt website performance or evading CAPTCHA protections, among other violations. OpenAI, in its blog post, said that a layered safety approach with safeguards for the model, system, and post-deployment processes is essential. “OpenAI also needs to ensure how this will protect the privacy of sensitive information when used to fill forms, whether it remains on the device, and that data used for reinforcement is not eventually misused for ads or sponsored listings as a lucrative business model,” Shah said. The tool’s capabilities could also pose challenges for Google and other search engines, which rely on collecting user data and processing cookies to target ads. By giving users and OpenAI more control over data, the technology may disrupt traditional advertising models.
https://www.computerworld.com/article/3809415/openai-debuts-ai-agent-operator-to-transform-web-task-...
Related News |
25 sources
Current Date
Jan, Sat 25 - 02:21 CET
|