"Prompt injection" has become the main threat to AI browsers - ForkLog: cryptocurrencies, AI, singularity, future

2025-12-23 09:42:01

# “Prompt injection” has become the main danger for AI browsers.

The company OpenAI reported on the vulnerability of AI browsers and the measures to strengthen the security of its own solution — Atlas.

The company acknowledged that “prompt injection” attacks, which manipulate agents into executing malicious instructions, are a risk. And it will not disappear anytime soon.

“Such vulnerabilities, like fraud and social engineering on the internet, are unlikely to ever be completely eliminated,” wrote OpenAI.

She noted that the “agent mode” in Atlas “increases the threat area.”

In addition to Sam Altman's startup, other experts have also drawn attention to the problem. In early December, the UK's National Cyber Security Centre warned that attacks involving the integration of malicious prompts “will never go away.” The government advised cybersecurity specialists not to try to stop the problem but to reduce the risks and consequences.

“We consider this a long-term artificial intelligence security issue and will continuously strengthen our protections,” noted OpenAI.

Measures of combating

Prompt injection is a way of manipulating AI by intentionally adding text to its input data that causes it to ignore the original instructions.

OpenAI reported on the use of a proactive rapid response cycle, which shows promising results in identifying new attack strategies before they emerge “in real-world conditions”.

Anthropic and Google express similar thoughts. The competitors suggest implementing multi-layered protection and continuously conducting stress tests.

OpenAI uses an “automated LLM-based attacker” — an AI bot that is trained to play the role of a hacker looking for ways to infiltrate an agent with malicious prompts.

An artificial scammer is capable of testing the exploitation of a vulnerability in a simulator that will show the actions of the attacked neural network. Then the bot will study the reaction, adjust its actions, and make a second attempt, then a third, and so on.

Third parties do not have access to information about the internal thinking of the target AI. In theory, a “virtual hacker” should find vulnerabilities faster than a real attacker.

“Our AI assistant can prompt the agent to carry out complex, long-term malicious processes that are triggered over dozens or even hundreds of steps. We have observed new attack strategies that did not manifest in our campaign involving red team members or in external reports,” said the OpenAI blog.

Demonstration of the test. Source: OpenAI blog. In the given example, an automated attacker sent an email to the user's address. Then the AI agent scanned the email service and executed hidden instructions, sending a termination message instead of composing a reply about absence from work.

After the security update, the “agent mode” was able to detect an attempt at sudden prompt injection and mark it for the user.

OpenAI emphasized that while it is difficult to reliably defend against such types of attacks, it relies on large-scale testing and rapid correction cycles.

Recommendations for Users

The Chief Security Researcher at Wiz, Rami McCarthy, emphasized that reinforcement learning is one of the key ways to continuously adapt to the behavior of malicious actors, but it is only part of the picture.

“A useful way to think about risks in AI systems is autonomy multiplied by access. Agent browsers are in a complicated part of this space: moderate autonomy combined with very high access. Many current recommendations reflect this trade-off. Limiting access after login primarily reduces vulnerability, while requiring confirmation request verification limits autonomy,” said the expert.

These two recommendations were provided by OpenAI to users to mitigate risk. The startup also suggested giving agents specific instructions rather than providing access to email and asking to “take any necessary actions.”

McCarthy noted that as of today, browsers with built-in AI agents do not provide enough benefit to justify the risk profile.

“This balance will evolve, but today compromises are still very real,” he concluded.

Let us remind you that in November, Microsoft experts presented an environment for testing AI agents and identified vulnerabilities inherent in modern digital assistants.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.