CWE-20 Class Stable High likelihood

Improper Input Validation

This vulnerability occurs when an application accepts data from an external source but fails to properly verify that the data is safe and correctly formatted before using it. This missing or flawed…

Definition

What is CWE-20?

This vulnerability occurs when an application accepts data from an external source but fails to properly verify that the data is safe and correctly formatted before using it. This missing or flawed validation check allows malicious or malformed inputs to disrupt the application's logic or security.
Input validation acts as a security checkpoint for your application. It involves checking all incoming data—whether raw content like strings and file uploads, or metadata like headers—against a strict set of rules for safety and correctness. You need to verify properties like size, type, structure, and business logic compliance before the data is processed or passed to other system components. Effective validation must check both obvious and derived properties. This includes ensuring data consistency, verifying calculated values like actual file size, and confirming conformance to domain-specific rules. The goal is to establish trust in the data your code handles, preventing attackers from exploiting unexpected inputs to trigger crashes, bypass security, or corrupt your application's behavior.
Vulnerability Diagram CWE-20
Improper Input Validation Attacker malformed input Validation Layer type check? ✗ range check? ✗ format? ✗ Backend Logic processes bad data → crash Missing or weak checks let untrusted input reach business logic.
Real-world impact

Real-world CVEs caused by CWE-20

  • Large language model (LLM) management tool does not validate the format of a digest value (CWE-1287) from a private, untrusted model registry, enabling relative path traversal (CWE-23), a.k.a. Probllama

  • Chain: a learning management tool debugger uses external input to locate previous session logs (CWE-73) and does not properly validate the given path (CWE-20), allowing for filesystem path traversal using "../" sequences (CWE-24)

  • Chain: improper input validation (CWE-20) leads to integer overflow (CWE-190) in mobile OS, as exploited in the wild per CISA KEV.

  • Chain: improper input validation (CWE-20) leads to integer overflow (CWE-190) in mobile OS, as exploited in the wild per CISA KEV.

  • Chain: backslash followed by a newline can bypass a validation step (CWE-20), leading to eval injection (CWE-95), as exploited in the wild per CISA KEV.

  • Chain: insufficient input validation (CWE-20) in browser allows heap corruption (CWE-787), as exploited in the wild per CISA KEV.

  • Chain: improper input validation (CWE-20) in username parameter, leading to OS command injection (CWE-78), as exploited in the wild per CISA KEV.

  • Chain: security product has improper input validation (CWE-20) leading to directory traversal (CWE-22), as exploited in the wild per CISA KEV.

How attackers exploit it

Step-by-step attacker path

  1. 1

    This example demonstrates a shopping interaction in which the user is free to specify the quantity of items to be purchased and a total is calculated.

  2. 2

    The user has no control over the price variable, however the code does not prevent a negative value from being specified for quantity. If an attacker were to provide a negative value, then the user would have their account credited instead of debited.

  3. 3

    This example asks the user for a height and width of an m X n game board with a maximum dimension of 100 squares.

  4. 4

    While this code checks to make sure the user cannot specify large, positive integers and consume too much memory, it does not check for negative values supplied by the user. As a result, an attacker can perform a resource consumption (CWE-400) attack against this program by specifying two, large negative values that will not overflow, resulting in a very large memory allocation (CWE-789) and possibly a system crash. Alternatively, an attacker can provide very large negative values which will cause an integer overflow (CWE-190) and unexpected behavior will follow depending on how the values are treated in the remainder of the program.

  5. 5

    The following example shows a PHP application in which the programmer attempts to display a user's birthday and homepage.

Vulnerable code example

Vulnerable Java

This example demonstrates a shopping interaction in which the user is free to specify the quantity of items to be purchased and a total is calculated.

Vulnerable Java
...
  public static final double price = 20.00;
  int quantity = currentUser.getAttribute("quantity");
  double total = price * quantity;
  chargeUser(total);
  ...
Attacker payload

The programmer intended for $birthday to be in a date format and $homepage to be a valid URL. However, since the values are derived from an HTTP request, if an attacker can trick a victim into clicking a crafted URL with tags providing the values for birthday and / or homepage, then the script will run on the client's browser when the web server echoes the content. Notice that even if the programmer were to defend the $birthday variable by restricting input to integers and dashes, it would still be possible for an attacker to provide a string of the form:

Attacker payload
2009-01-09--
Secure code example

Secure pseudo

Secure pseudo
// Validate, sanitize, or use a safe API before reaching the sink.
function handleRequest(input) {
  const safe = validateAndEscape(input);
  return executeWithGuards(safe);
}
What changed: the unsafe sink is replaced (or the input is validated/escaped) so the same payload no longer triggers the weakness.
Prevention checklist

How to prevent CWE-20

  • Architecture and Design Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]
  • Architecture and Design Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).
  • Architecture and Design / Implementation Understand all the potential areas where untrusted inputs can enter the product, including but not limited to: parameters or arguments, cookies, anything read from the network, environment variables, reverse DNS lookups, query results, request headers, URL components, e-mail, files, filenames, databases, and any external systems that provide data to the application. Remember that such inputs may be obtained indirectly through API calls.
  • Implementation Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
  • Architecture and Design For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server. Even though client-side checks provide minimal benefits with respect to server-side security, they are still useful. First, they can support intrusion detection. If the server receives input that should have been rejected by the client, then it may be an indication of an attack. Second, client-side error-checking can provide helpful feedback to the user about the expectations for valid input. Third, there may be a reduction in server-side processing time for accidental input errors, although this is typically a small savings.
  • Implementation When your application combines data from multiple sources, perform the validation after the sources have been combined. The individual data elements may pass the validation step but violate the intended restrictions after they have been combined.
  • Implementation Be especially careful to validate all input when invoking code that crosses language boundaries, such as from an interpreted language to native code. This could create an unexpected interaction between the language boundaries. Ensure that you are not violating any of the expectations of the language with which you are interfacing. For example, even though Java may not be susceptible to buffer overflows, providing a large argument in a call to native code might trigger an overflow.
  • Implementation Directly convert your input type into the expected data type, such as using a conversion function that translates a string into a number. After converting to the expected data type, ensure that the input's values fall within the expected range of allowable values and that multi-field consistencies are maintained.
Detection signals

How to detect CWE-20

Automated Static Analysis

Some instances of improper input validation can be detected using automated static analysis. A static analysis tool might allow the user to specify which application-specific methods or functions perform input validation; the tool might also have built-in knowledge of validation frameworks such as Struts. The tool may then suppress or de-prioritize any associated warnings. This allows the analyst to focus on areas of the software in which input validation does not appear to be present. Except in the cases described in the previous paragraph, automated static analysis might not be able to recognize when proper input validation is being performed, leading to false positives - i.e., warnings that do not have any security consequences or require any code changes.

Manual Static Analysis

When custom input validation is required, such as when enforcing business rules, manual analysis is necessary to ensure that the validation is properly implemented.

Fuzzing

Fuzzing techniques can be useful for detecting input validation errors. When unexpected inputs are provided to the software, the software should not crash or otherwise become unstable, and it should generate application-controlled error messages. If exceptions or interpreter-generated error messages occur, this indicates that the input was not detected and handled within the application logic itself.

Automated Static Analysis - Binary or Bytecode SOAR Partial

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Cost effective for partial coverage: ``` Bytecode Weakness Analysis - including disassembler + source code weakness analysis Binary Weakness Analysis - including disassembler + source code weakness analysis

Manual Static Analysis - Binary or Bytecode SOAR Partial

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Cost effective for partial coverage: ``` Binary / Bytecode disassembler - then use manual analysis for vulnerabilities & anomalies

Dynamic Analysis with Automated Results Interpretation High

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Web Application Scanner Web Services Scanner Database Scanners

Plexicus auto-fix

Plexicus auto-detects CWE-20 and opens a fix PR in under 60 seconds.

Codex Remedium scans every commit, identifies this exact weakness, and ships a reviewer-ready pull request with the patch. No tickets. No hand-offs.

Frequently asked questions

Frequently asked questions

What is CWE-20?

This vulnerability occurs when an application accepts data from an external source but fails to properly verify that the data is safe and correctly formatted before using it. This missing or flawed validation check allows malicious or malformed inputs to disrupt the application's logic or security.

How serious is CWE-20?

MITRE rates the likelihood of exploit as High — this weakness is actively exploited in the wild and should be prioritized for remediation.

What languages or platforms are affected by CWE-20?

MITRE has not specified affected platforms for this CWE — it can apply across most application stacks.

How can I prevent CWE-20?

Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111] Use an…

How does Plexicus detect and fix CWE-20?

Plexicus's SAST engine matches the data-flow signature for CWE-20 on every commit. When a match is found, our Codex Remedium agent opens a fix PR with the corrected code, tests, and a one-line summary for the reviewer.

Where can I learn more about CWE-20?

MITRE publishes the canonical definition at https://cwe.mitre.org/data/definitions/20.html. You can also reference OWASP and NIST documentation for adjacent guidance.

Related weaknesses

Weaknesses related to CWE-20

CWE-707 Parent

Improper Neutralization

This vulnerability occurs when an application fails to properly validate or sanitize structured data before it's received from an external…

CWE-116 Sibling

Improper Encoding or Escaping of Output

This vulnerability occurs when an application builds a structured message—like a query, command, or request—for another component but…

CWE-138 Sibling

Improper Neutralization of Special Elements

This vulnerability occurs when an application accepts external input but fails to properly sanitize special characters or syntax that have…

CWE-1426 Sibling

Improper Validation of Generative AI Output

This vulnerability occurs when an application uses a generative AI model (like an LLM) but fails to properly check the AI's output before…

CWE-170 Sibling

Improper Null Termination

This weakness occurs when software fails to properly end a string or array with the required null character or equivalent terminator.

CWE-172 Sibling

Encoding Error

This vulnerability occurs when software incorrectly transforms data between different formats, leading to corrupted or misinterpreted…

CWE-182 Sibling

Collapse of Data into Unsafe Value

This vulnerability occurs when an application's data filtering or transformation process incorrectly merges or simplifies information,…

CWE-228 Sibling

Improper Handling of Syntactically Invalid Structure

This vulnerability occurs when software fails to properly reject or process input that doesn't follow the expected format or structure,…

CWE-240 Sibling

Improper Handling of Inconsistent Structural Elements

This vulnerability occurs when a system fails to properly manage situations where related data structures or elements should match but are…

Ready when you are

Don't Let Security
Weigh You Down.

Stop choosing between AI velocity and security debt. Plexicus is the only platform that runs Vibe Coding Security and ASPM in parallel — one workflow, every codebase.