What the New Wave of AI Security Standards Actually Asks of You
In February 2024, a tribunal in British Columbia ordered Air Canada to honor a bereavement fare that one of its chatbots had invented. The airline argued, more or less, that the chatbot was responsible for its own statements. The tribunal didn't buy it, and ordered the airline to pay. It's a minor case in dollar terms, but it captured something I'd watched take shape from the standards side for a couple of years: the question of who answers for an AI system's behavior had stopped being theoretical. Around the same time, the documents that try to answer it began landing in earnest.
| Standard | Focus | Status |
|---|---|---|
| ISO/IEC 42001 | Certifiable AI management system | Published 2023 |
| ISO/IEC 23894 | AI risk-management guidance | Published 2023 |
| ISO/IEC 27090 | Security threats to AI systems | In development |
| NIST AI RMF | Voluntary risk framework + GenAI Profile | 2023 · 2024 |
| OWASP LLM Top 10 | Top LLM application security risks | 2025 revision |
| EU AI Act | Risk-based regulation (legal) | In force |
What "AI security standards" actually refers to
People say "AI security standards" as if the phrase names one thing. It names at least five, written by different bodies, pitched at different levels of abstraction, for different readers. Sorting out which is which is the first practical step, because teams routinely try to satisfy a governance standard with a code review, or a technical problem with a policy document, and wonder why they're stuck.
ISO/IEC 42001, published in December 2023, is the one that gets the headlines. It's the first certifiable management-system standard for artificial intelligence, the AI counterpart to ISO/IEC 27001, and it sits on the same Annex SL structure. If you've been through a 27001 certification, its shape is familiar: organizational context, leadership, planning, a set of Annex A controls, and the obligation to prove you actually operate them. What it is not is a technical specification. It tells you to govern AI. It says nothing about how to stop a prompt injection.
Beneath it sits the risk and vocabulary layer. ISO/IEC 23894 adapts the familiar ISO 31000 risk approach to AI, and ISO/IEC 22989 pins down the terminology so people stop talking past each other. The standard aimed squarely at security, ISO/IEC 27090, on addressing security threats to AI systems, is still in development as I write this. That's worth stating plainly, because it gets cited in vendor decks as though it were already final.
In the United States, NIST's AI Risk Management Framework, released in January 2023, is voluntary and deliberately non-prescriptive, organized around four functions: govern, map, measure, and manage. The part most teams actually reach for is its Generative AI Profile, published in July 2024, which sets out a dozen risks that are specific to or amplified by generative systems, from confabulation and data poisoning to information-security failures and harmful bias, and attaches concrete suggested actions to each one.
The document engineers find most usable comes from neither ISO nor NIST. It's the OWASP Top 10 for Large Language Model Applications, first published in 2023 and revised for 2025. Where the ISO and NIST material stays abstract, OWASP names the actual threat vectors. The 2025 framework splits the active threat landscape into ten distinct vectors:
- LLM01 Prompt Injection
- LLM02 Sensitive Information Disclosure
- LLM03 Supply Chain
- LLM04 Data and Model Poisoning
- LLM05 Improper Output Handling
- LLM06 Excessive Agency
- LLM07 System-Prompt Leakage New in 2025
- LLM08 Vector and Embedding Weaknesses New in 2025
- LLM09 Misinformation
- LLM10 Unbounded Consumption
The wider OWASP GenAI Security Project around it, and the OWASP AI Exchange in particular, has been contributing its material directly into the ISO/IEC and European standards processes, and it's the work I spend my own committee time on.
The mechanism is more mundane than it sounds. A consensus standard takes years to draft and ratify, which is far slower than attackers invent techniques or than the field agrees on what to call them. So rather than define a vulnerability like excessive agency or prompt injection from first principles, the ISO/IEC 27090 and CEN-CENELEC working drafts lean on the taxonomy OWASP has already built and pressure-tested in the open.
The last one carries legal weight. The EU AI Act, Regulation (EU) 2024/1689, entered into force in August 2024 and applies in phases: the prohibitions on unacceptable uses since February 2025, the obligations on general-purpose AI models since August 2025, and the bulk of the high-risk requirements arriving in August 2026. The Act itself avoids prescribing technical controls. It defers to harmonized European standards, which CEN-CENELEC's JTC 21 is still drafting, and grants a "presumption of conformity" to anyone who meets them. That clause is why the ISO work suddenly has commercial weight: harmonized standards are becoming the cheapest route into the EU market.
Why your existing controls don't quite cover this
A fair amount of AI security is just security, and the standards say so. Access control, encryption, logging, a secure development lifecycle: all of it still applies; a model doesn't earn an exemption. But three properties make these systems genuinely different, and they are the ones the frameworks keep circling back to.
-
One channel for instructions and data
A language model reads its system prompt, the user's message, and any retrieved context as one undifferentiated sequence of tokens; instruction and data share the same in-band stream. That is why prompt injection, LLM01 on the OWASP list, has no clean fix. There is no parser to harden and no escaping scheme that reliably separates "trusted instruction" from "attacker-supplied content," because to the model the distinction doesn't exist. This is not a parsing bug waiting for a patch; it is a property of the architecture. Conventional software security rests on a boundary we can actually enforce: an interpreter separates code from the data it operates on, a parameterized query keeps user input out of the SQL grammar, the processor distinguishes its instruction stream from the operands it acts on. Decades of application security is, at bottom, the work of keeping attacker-controlled data out of the instruction path. A transformer erases that boundary. Prompt and payload arrive as the same kind of object, tokens, and the model attends over all of them with the same weights; there is no privileged region of the context that carries the meaning "this part is the trusted instruction." Trust becomes semantic rather than syntactic, and you cannot escape or parameterize a meaning. MITRE's ATLAS knowledge base, which catalogs real adversary techniques against machine-learning systems the way ATT&CK does for networks, reads in large part as a list of consequences that follow from this single fact.
-
Provenance you don't control
The model came from somewhere, trained on data you almost certainly did not inspect, and is served by a stack of dependencies you did not write. That is ordinary supply-chain risk, applied to an artifact most teams have not started treating as one.
-
A system that keeps moving
A model gets updated, a prompt gets tuned, a provider changes something upstream, and the behavior you signed off on last quarter is quietly gone. Point-in-time assurance, which most security programs and nearly every audit still assume, decays faster here than anywhere else I've worked. The reason is non-determinism. A fixed codebase produces predictable output: the function you tested is the function that runs. A model is a moving, probabilistic target. A vendor's minor version bump, a change in sampling temperature, or a single new document arriving in your retrieval store can reopen a jailbreak that your last assessment proved closed, without one line of your own code changing. "We tested that in Q1" is not a claim that survives contact with a system whose behavior drifts on its own.
What's still being argued
It would be tidy to say the standards have settled the matter. They haven't, and pretending otherwise does clients a disservice.
The harmonized European standards the AI Act leans on are not finished. Organizations are being asked to prepare for requirements whose technical detail is still being negotiated in committee, which means some of the "AI Act compliance" on offer today is an educated guess wearing a confident face. I'd rather tell a client which parts are firm and which are still moving than sell them certainty that doesn't exist yet.
There is also a stubborn confusion between two very different claims. An ISO 42001 certificate, or an EU conformity assessment, attests that you have a system for managing AI risk. It does not attest that any particular model is safe. I've watched buyers read a management-system certificate as a technical guarantee, and watched vendors decline to correct them. Those are not the same statement, and only one of them is established by actually testing the system. Put bluntly: a company can hold a clean, audited ISO 42001 certificate and, at the same moment, be running a model that quietly hands one customer's data to another. There is no contradiction in that. The certificate attests that a risk-management process exists and is being followed, not that the process caught everything or that the technical controls actually hold under attack. Anyone selling you assurance should be able to say which of the two they are actually offering.
A certified process and a leaking model coexist comfortably.
Even the vocabulary is in motion. Definitions of basic terms still differ between documents, and the line between "AI safety" and "AI security" gets drawn differently depending on who is holding the pen. None of this is a reason to wait. It is a reason to read the standards as a direction of travel rather than a finished rulebook.
What it asks of you, in practice
Strip the five documents down to what they share and a short, unglamorous list is left. If you're putting anything with a model behind it in front of users, this is the work:
- Know what you have. Keep an inventory of your AI systems, the models and datasets behind them, and where they touch users and sensitive data. Extend your bill of materials to cover models, not just packages. Almost no one can produce this on request, and every framework assumes you can.
- Threat-model the actual system. Avoid generic checklists. Map the real attack surface against the OWASP LLM Top 10 and the techniques in MITRE ATLAS, for the architecture you built rather than the reference one in a diagram. In practice that means scrutinizing the parts the frameworks gloss over: the orchestration layer (LangChain, LlamaIndex, Semantic Kernel, or the glue you wrote yourself), where tool calls get wired to real capabilities, and the data pipeline feeding your retrieval store, where untrusted content quietly becomes trusted context. That is where excessive agency and retrieval poisoning actually live, not in the model weights everyone fixates on.
- Keep evidence. Model cards, data lineage, evaluation and red-team results, and written decisions about the risk you've chosen to accept. The governance standards are, in practical terms, a demand that you can show your work.
- Test on a schedule, not once. Because the system changes, assurance has to be continuous. Tie it to model and dependency updates the way you'd tie regression testing to a release.
- Plan for AI-specific failure. An incident process that anticipates a jailbreak in production, a poisoned dependency, or a model leaking data, not only the network breach you already have a runbook for.
The quiet advantage is how much of this overlaps. Do the threat-modeling and evidence work once, against the AI-specific risks, and you will have answered most of what ISO, NIST, and the EU AI Act each ask for in their own dialects. The frameworks differ in wording far more than they differ in intent.
The certificate is not the goal
The temptation with any new compliance regime is to mistake the certificate for the goal. It's an expensive mistake here. The standards are genuinely useful. They give a fragmented field a shared language, and they make your security legible to customers and regulators who need some way to compare one vendor against another. But a 42001 audit tells someone you have a process. It does not tell them an attacker can't talk your agent into calling an internal tool it should never touch. The first claim is established on paper. The second is established only by trying.
The organizations that come through the next two years in good shape will be the ones that used the standards as scaffolding for real engineering, and never lost track of which was the scaffolding and which was the building.