The Hacker News
ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface
Cybersecurity researchers have unveiled a significant vulnerability in OpenAI's ChatGPT, which exploits the AI assistant's inherent trust in Markdown links and images. This flaw facilitates prompt injections and paves the way for phishing attacks.
This vulnerability, dubbed ChatGPhish by Permiso Security, allows for critical exploits. "The chatgpt.com response renderer trusts Markdown links and Markdown image URLs that originated from a third-party page the assistant has just summarized. It auto-fetches those images and surfaces those links as live, clickable elements inside the trusted assistant UI," stated security researcher Andi Ahmeti in a report provided to The Hacker News.
Attack Scenario
In a potential attack scenario, malicious actors can append a brief payload to any webpage that the target later requests ChatGPT to summarize. This can lead to the unintentional leakage of sensitive information, including the user's IP address, User-Agent, and Referer details, when the embedded images from the attacker's hosted content are automatically fetched during the rendering of the response.
Additionally, the implications include:
- The transformation of malicious Markdown links into live clickable elements within the assistant's output.
- The presentation of counterfeit system-style security alerts.
- The generation of QR codes from the attacker's S3 bucket, misleading victims into scanning them with their mobile devices.
- The ability to effectively bypass desktop URL filters and enterprise security measures.
Broader Context
This vulnerability illustrates how the process of summarization can serve as an avenue for adversarial attacks. In a related finding disclosed in March, Permiso illustrated how an attacker-controlled email, when summarized by Microsoft Copilot, could manipulate its output through a cross-prompt injection (XPIA) or indirect prompt injection.
What distinguishes ChatGPhish is not merely the prompt injection itself, but the method in which the instructions embedded within a web page are executed and displayed to the user as part of the summary. Simply summarizing a webpage using ChatGPT can result in the exposure of phishing links, counterfeit account alerts, remote images, and QR codes within a trusted AI interface. Consequently, as businesses increasingly utilize ChatGPT for research and summarization, this vulnerability indicates that any harmful webpage an employee prompts the AI chatbot to process could potentially carry a payload that transforms ChatGPT into a phishing surface.
"The transition from email to the browser significantly broadens the potential attack surface. A user no longer needs to open a malicious attachment or engage with a suspicious message," remarked Permiso. "Merely summarizing a page during routine browsing activities can inject attacker-controlled instructions into the model context and, ultimately, into the rendered output."
Additional AI Security Threats
This disclosure coincides with findings from Adversa AI, which documented two additional attack techniques named SymJack and TrustFall. These tactics target AI coding agents and agentic coding command-line interfaces (CLIs), enabling attackers to achieve code execution and full machine compromise.
SymJack
SymJack represents a "single attack pattern [that] enables a malicious repository to achieve remote code execution via AI coding assistants," explained security researcher Rony Utevsky. "The agent is deceived into performing a seemingly harmless file copy that secretly overwrites its configuration, resulting in the execution of attacker code with full user privileges upon the next restart."
In detail, this exploit involves a deceptive repository that tricks the agent into copying an innocuous-looking file to a symlink pointing to its own configuration, thereby embedding the attacker's payload. Upon the next reboot, a malicious Model Context Protocol (MCP) server initiates, executing arbitrary code with full user privileges.
TrustFall
TrustFall is another one-click remote code execution attack that employs a malicious repository capable of shipping a configuration that automatically approves and activates an MCP server without requiring explicit consent from the user or a tool call from the agent.
To clarify, all a threat actor must do to execute this attack is to establish a repository that contains a malicious MCP server along with configuration settings that allow it to run without approval. Once a developer clones or accesses the repository through the AI coding tool and confirms the folder trust prompt, the AI coding tool inadvertently launches the attacker's code with the developer's complete system privileges.
"The moment a victim clones the repo, runs the AI, and clicks the generic 'Yes, I trust this folder' dialog, the MCP server begins as a native OS process with full user privileges," Adversa AI elaborated. "The payload executes upon server startup, prior to any tool calls and without further prompts."
Recent AI Attack Methods
Recent months have showcased a variety of attack strategies aimed at AI models:
-
A novel jailbreak tactic called Involuntary In-Context Learning (IICL), which "leverages the conflict between in-context learning (ICL) and safety alignment" to circumvent GPT-5.4 safety measures.
-
Safety protocols in large language models (LLMs) can be evaded if users manipulate the model into engaging in multi-turn discussions. "Multi-turn evaluation is significant for a simple reason: it is precisely where attackers operate," noted Cisco. "Real adversaries iterate. They reframe refusals, break down tasks across multiple turns, adopt different personas, and escalate gradually. A one-turn benchmark cannot detect any of that."
-
A vulnerability within Anthropic Claude Code allows for a user-level configuration alteration in "~/.claude.json" to reconfigure MCP endpoints through a rogue npm package, enabling an attacker to intercept tokens used for accessing downstream SaaS services.
-
The implementation of a remote update mechanism allows an OpenClaw skill to appear harmless at installation but later enables an attacker to influence the agent via workspace files.
-
The utilization of concealed text drawn from a legitimate newsletter or a romance novel in phishing emails may mislead an AI-driven email security system into incorrectly categorizing the message as benign.
-
A vulnerability in Claude's Chrome extension, referred to as ClaudeBleed, permits any extension—even those lacking special permissions—to hijack it, compelling the AI assistant to execute actions on their behalf.
-
Research by Cisco identified that adversarial text presented in images, known as typographic prompt injection, can bypass safety filters in vision language models (VLMs).
-
A collection of vulnerabilities in Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030) has the potential to convert a prompt injection into host-level remote code execution.
-
The deployment of the Neural Exec prompt injection attack and the Unicode right-to-left-override function enables circumvention of Apple's input and output filters, as well as safety measures of Apple Intelligence's local model.
-
An indirect prompt injection vulnerability codenamed WebPromptTrap impacts BrowserOS, an open-source agentic browser, tricking users into approving an authorization step via an AI summary.
-
An audit of the agent skills ecosystem covering ClawHub and skills.sh discovered that 13.4% of 3,984 skills (totaling 534) present at least one critical security flaw, including malware distribution, prompt injection attacks, and exposed secrets.
-
Two attacks targeting NemoClaw, NVIDIA's open-source reference stack designed to secure OpenClaw AI agents, were executed to exfiltrate OpenClaw data using the default settings of the sandbox.
Future Threats
As cutting-edge AI models progress and become more sophisticated, threat actors are increasingly experimenting with the technology to create malware capable of dynamically adjusting its behavior to evade detection, as well as delegating decision-making to large language models (LLMs) to evaluate whether the compromised environment is valuable or safe enough for subsequent payload drops.
"In the near term, the proliferation of the capabilities of frontier AI models presents risks by empowering adversaries to exploit zero-day and n-day vulnerabilities at an unprecedented scale," mentioned Palo Alto Networks Unit 42. "It is also probable that attackers will be able to operate with greater scale, sophistication, and speed than ever before."
Recently, the cybersecurity company also released details about a proof-of-concept (PoC) agent known as Zealot, which leverages the capabilities of LLMs to conduct comprehensive cloud attacks with minimal human intervention by exploiting known misconfigurations and vulnerabilities.
This situation arises from the intrinsic nature of cloud environments being "AI-Attack-Ready" by default, as every action corresponds to an API alternative, utilizing various discovery mechanisms like metadata and enumeration services, rife with misconfigurations, and reliant on credential-based access.
"Current LLMs can chain reconnaissance, exploitation, privilege escalation, and data exfiltration with minimal human guidance," emphasized Unit 42 researchers Yahav Festinger and Chen Doytshman. "Although the attacks may not be novel, automation allows operations that once required specialized expertise to now be orchestrated by an AI agent following established patterns."
Share this story