AI security: how companies protect their models and data

Automation
May 8, 2026

Security teams that spent the last decade learning to protect databases, APIs, and network perimeters are now confronting a threat surface that those frameworks were not designed for. AI systems introduce attack vectors that have no direct equivalent in traditional application security: models that can be manipulated through their inputs, training pipelines that can be poisoned before deployment, and autonomous agents that can be redirected by instructions embedded in the content they process. The organizations that treat AI security as a variant of conventional application security are systematically underprotected against the threats that AI-specific architectures create.

Table of Contents

The threat surface that AI creates

Conventional application security focuses on protecting systems from unauthorized access and from inputs that exploit known vulnerabilities in code. AI systems are vulnerable to a different class of attack because their behavior is determined by learned patterns rather than explicit code, which means that an attacker who can influence what the model learns or how it processes inputs can influence its behavior without exploiting any code vulnerability.

Adversarial inputs are the most studied AI attack class: carefully crafted inputs designed to cause AI models to produce incorrect outputs. In computer vision, adversarial examples are images modified in ways imperceptible to humans but that cause a model to misclassify them confidently. In language models, adversarial prompts are inputs that bypass safety measures or cause the model to produce outputs its developers did not intend. The practical deployment implications vary by application: an adversarial attack on a fraud detection model that causes it to approve fraudulent transactions is a different risk profile than an adversarial attack on a content moderation model, but both represent the same architectural vulnerability exploited in different operational contexts.

Prompt injection attacks, examined in the context of autonomous agent deployments in our analysis of the hidden risks companies ignore in enterprise AI governance, represent the specific adversarial input class most relevant to production LLM deployments. An AI agent that processes external content, emails, web pages, uploaded documents, can be redirected by instructions embedded in that content by an attacker who understands the agent’s instruction-following behavior. The attack surface for prompt injection scales directly with the scope of external content the agent processes and the breadth of tools it can access.

Model theft and intellectual property protection

Organizations that have invested significant resources in training or fine-tuning AI models on proprietary data face an intellectual property risk that conventional security frameworks do not address: model extraction attacks, where an adversary uses the model’s API to reconstruct a functional equivalent of the model without access to its weights or training data.

Model extraction works by querying the target model systematically, collecting input-output pairs, and training a surrogate model on that data that approximates the target model’s behavior. The attack does not require access to the model’s internal parameters. It requires only the ability to query the model, which in a production API deployment is available to any authorized user. Organizations that have invested in proprietary fine-tuning of foundation models for competitive advantage are exposed to this attack through their own production APIs.

Mitigations include rate limiting on API queries, watermarking model outputs in ways that allow attribution of stolen model outputs to their source, and monitoring query patterns for the systematic behavior characteristic of extraction attacks. None of these mitigations is complete. The fundamental tension between providing useful API access and preventing systematic misuse of that access does not have a clean resolution, and organizations must calibrate their API access controls against both the legitimate user experience they need to provide and the extraction risk they are willing to accept.

Training data poisoning: attacking models before deployment

Training data poisoning attacks target AI models before they are deployed, by introducing carefully designed examples into training data that cause the model to learn specific malicious behaviors or backdoors. A poisoned model may perform normally on standard inputs but behave unexpectedly when it encounters inputs that match specific trigger patterns the attacker embedded in the training data.

The practical threat depends on the training data provenance. Organizations training models on web-scraped data or on data sourced from multiple external providers have limited visibility into whether their training data has been poisoned by adversaries who anticipated that specific data sources would be used. This is not a theoretical concern: research groups have demonstrated that web-scale poisoning attacks, where adversaries modify small numbers of accessible web documents to influence downstream model behavior, are feasible with resources available to sophisticated adversaries.

The defense requires training data governance at a level of rigor that most organizations do not currently apply: provenance tracking for all training data, anomaly detection in training data to identify unusual patterns, and evaluation methodologies that specifically test for backdoor behavior in trained models. This is the AI-specific expression of the data governance principles examined in our coverage of why AI data is becoming an enterprise governance crisis.

Infrastructure security for AI deployments

The AI infrastructure security requirements overlap with conventional infrastructure security but introduce AI-specific considerations in model storage, serving infrastructure, and the container environments that most production AI systems run in.

Model weights represent a high-value target that has no direct equivalent in conventional application security. A trained model, particularly one fine-tuned on proprietary data for a specific enterprise application, represents concentrated intellectual property and operational capability. Protecting model weights requires access controls, encryption at rest and in transit, and audit logging that tracks every access to model artifacts, treating them with the same security rigor applied to the most sensitive data the organization holds.

The container security implications of AI deployments are examined in depth in our companion analysis of container security and protecting AI infrastructure. The specific challenges of securing the containerized inference environments that most production AI systems run in include the large attack surface of AI-specific container images, the elevated privilege requirements that GPU access creates, and the dynamic networking patterns of multi-container AI pipelines.

The regulatory dimension: AI security as compliance

The EU AI Act’s security requirements for high-risk AI systems, examined in our coverage of what EU AI Act implementation requires from enterprises, create specific compliance obligations around cybersecurity that translate into concrete AI security investments. High-risk AI systems must be resilient against attempts to alter their outputs or exploit vulnerabilities in their architecture. Demonstrating this resilience requires documented security testing, adversarial evaluation, and ongoing monitoring that most organizations’ current AI security programs do not systematically provide.

The compliance framing creates a governance pathway for AI security investment that the threat-based framing alone may not. Organizations that have struggled to prioritize AI security investment against other security spending can point to specific EU AI Act requirements as the regulatory basis for specific security controls, connecting AI security to the compliance governance process that enterprise security budgets typically flow through.

Building an AI security program

The organizations that have built effective AI security programs have not done so by adding AI to their existing application security framework. They have built AI-specific security disciplines that address the threat classes unique to AI systems alongside the conventional security controls that AI infrastructure shares with any production system.

The practical architecture of an AI security program includes: adversarial testing as a standard step in model evaluation before deployment; prompt injection testing for any LLM deployment that processes external content; training data provenance tracking and anomaly detection; model extraction monitoring on production APIs; model artifact protection with access controls and audit logging; and AI-specific incident response procedures that address the unique recovery requirements when an AI system has been compromised.

None of this replaces conventional infrastructure security. It supplements it with AI-specific controls that the conventional framework does not address. The organizations that understand this additive relationship are building the most comprehensive AI security posture. Those that treat AI security as covered by their existing security program are carrying specific, unaddressed vulnerabilities that adversaries are actively learning to exploit.

AI security is not a future concern for organizations that have moved AI from experimentation to production. It is a present operational requirement whose specific threat classes differ enough from conventional application security to require dedicated investment in AI-specific controls. The threat surface that AI systems create, adversarial inputs, prompt injection, model theft, training data poisoning, and the high-value target of model weights, requires security thinking that most organizations’ current security teams have not yet developed.

For the infrastructure protection context, see container security: protecting AI infrastructure and AI servers: the infrastructure behind large AI models. For the governance framework that AI security fits within, read AI governance news: the hidden risks companies ignore.

The question every CISO with AI systems in production must answer: Has your security team conducted adversarial testing against your production AI models using the attack classes specific to AI systems, or has your AI security review consisted of applying your existing application security framework to AI systems that it was not designed to cover?

Ryan Daws

Blog author

Ryan Daws has been covering the intersection of artificial intelligence, cybersecurity and corporate governance for over a decade. A former information systems security analyst who switched to technology journalism, he has written for several leading B2B publications and closely follows developments in cloud, edge and agent-based architectures.

Recents posts

May 8, 2026

Solana AI: how blockchain and AI are converging

May 8, 2026

WAN 2.1 VACE: what this new AI model can do

May 8, 2026

Trainium3: Amazon’s new AI chip explained