Critical AI Vulnerabilities: Major Tech Firms’ Inference Frameworks Exposed to Remote Attacks

1

Critical AI Bugs Expose Major Tech Firms' Inference Frameworks to Remote Attacks

Cybersecurity researchers have uncovered critical remote code execution vulnerabilities affecting AI inference engines from Meta, Nvidia, Microsoft, and open-source PyTorch projects. These serious flaws could allow attackers to execute arbitrary code, escalate privileges, and steal AI models.

Widespread vulnerability traced to unsafe code patterns

The vulnerabilities stem from a common pattern dubbed "ShadowMQ" where insecure deserialization logic has propagated across multiple projects through code reuse. Oligo Security researcher Avi Lumelsky identified the root cause as unsafe use of ZeroMQ (ZMQ) and Python's pickle deserialization.

"These vulnerabilities all traced back to the same root cause: the overlooked unsafe use of ZeroMQ (ZMQ) and Python's pickle deserialization," Lumelsky explained in a report published Thursday.

The original flaw was discovered in Meta's Llama large language model framework (CVE-2024-50050) and patched last October. However, the same dangerous pattern has since been found in multiple other frameworks:

  • NVIDIA TensorRT-LLM (CVE-2025-23254, CVSS score: 8.8)
  • vLLM (CVE-2025-30165, CVSS score: 8.0)
  • Microsoft Sarathi-Serve
  • Modular Max Server (CVE-2025-60455)
  • SGLang

The problem specifically involves using ZeroMQ's recv_pyobj() method to deserialize incoming data with Python's pickle module while exposing the ZeroMQ socket over a network. This creates a scenario where attackers can send malicious data for deserialization, triggering arbitrary code execution.

How code reuse spread the vulnerability

Researchers found that in several cases, the vulnerability was perpetuated through direct code copying between projects. For example, SGLang's vulnerable file acknowledges being adapted from vLLM, while Modular Max Server borrowed logic from both vLLM and SGLang.

"Different maintainers and projects maintained by different companies – all made the same mistake," noted Lumelsky.

This form of vulnerability propagation highlights a growing concern in fast-moving AI development, where architectural components are frequently borrowed between projects without sufficient security review.

"Projects are moving at incredible speed, and it's common to borrow architectural components from peers," Lumelsky said. "But when code reuse includes unsafe patterns, the consequences ripple outward fast."

These concerns align with broader risks and challenges of artificial intelligence in business environments, particularly when security considerations are secondary to rapid deployment and innovation.

Significant security implications for AI infrastructure

The vulnerabilities pose serious risks since inference engines are critical components within AI infrastructures. A successful attack could allow:

  • Execution of arbitrary code on AI clusters
  • Privilege escalation within systems
  • AI model theft
  • Deployment of malicious payloads like cryptocurrency miners

Most of the affected frameworks have released patches or implemented fixes:

  • NVIDIA fixed the issue in TensorRT-LLM version 0.18.2
  • vLLM addressed it by switching to the V1 engine by default
  • Modular Max Server has fixed the vulnerability
  • Microsoft's Sarathi-Serve remains unpatched
  • SGLang has implemented incomplete fixes

Organizations should implement a comprehensive approach to addressing AI security issues and challenges to protect their valuable intellectual property and infrastructure.

Technical mitigation strategies

For technical teams managing AI infrastructure, consider these additional defensive measures:

  • Implement network-level protections such as firewalls and intrusion detection systems specifically configured to monitor AI service traffic
  • Apply the principle of least privilege to all AI service accounts
  • Consider containerization to isolate AI inference services from other critical systems
  • Implement regular vulnerability scanning specifically targeting AI components

Additional AI security concerns emerge

The disclosure coincides with another AI security report from Knostic, which found vulnerabilities in Cursor's new built-in browser. Researchers discovered JavaScript injection techniques that could compromise the AI-powered code editor.

Two specific attack vectors were identified:

  1. A rogue local Model Context Protocol (MCP) server that bypasses Cursor's controls, allowing attackers to replace login pages with credential-harvesting fake pages

  2. Malicious extensions that can inject JavaScript into the IDE, enabling arbitrary actions including marking legitimate extensions as "malicious"

"JavaScript running inside the Node.js interpreter, whether introduced by an extension, an MCP server, or a poisoned prompt or rule, immediately inherits the IDE's privileges," the researchers warned. This includes full file system access and the ability to modify IDE functions.

How to protect your AI systems

For organizations utilizing these AI frameworks, immediate patching is recommended. Security experts also advise:

  1. Update to the latest versions of all AI frameworks and libraries
  2. Implement network segmentation for AI inference services
  3. Monitor for unusual activity on AI infrastructure
  4. Audit code for similar deserialization patterns

For users of AI coding tools like Cursor:

  • Disable Auto-Run features in IDEs
  • Only install extensions from trusted sources
  • Use API keys with minimal required permissions
  • Audit third-party integrations and MCP server code

Understanding why comprehensive cybersecurity protection is essential for businesses has never been more crucial as AI components become integral to enterprise technology stacks.

The broader context of AI security

These vulnerabilities highlight how AI's rapid development and deployment are creating new security challenges. Similar to the early days of cloud computing, security considerations sometimes lag behind innovation.

"We're seeing history repeat itself," says cybersecurity expert Bruce Schneier. "The rush to deploy new AI capabilities often means security is an afterthought rather than being built in from the start."

The discovery of these vulnerabilities serves as a reminder that as AI becomes increasingly integrated into critical infrastructure, securing these systems becomes paramount. Organizations developing or implementing AI technologies should prioritize security reviews and implement defense-in-depth strategies.

These incidents also demonstrate how the AI community needs better security practices around code sharing and reuse – what might be called "secure by collaboration" – to prevent vulnerable patterns from propagating across projects and companies.

Industry response and best practices

In response to these vulnerabilities, the AI security community has begun developing more robust guidelines for secure AI system development. The OWASP AI Security Top 10 provides a framework for identifying and mitigating common security risks in AI applications.

Security experts recommend that organizations:

  • Establish formal code review processes specifically for AI components
  • Create a security-focused change management process for AI system updates
  • Develop incident response plans that address AI-specific threats
  • Conduct regular penetration testing of AI systems
  • Implement robust logging and monitoring for AI infrastructure

By adopting these measures, organizations can better protect their AI investments while continuing to leverage these powerful technologies for business advantage.

You might also like