🎄 Join our Annual Holiday wargame and win prizes!


Insecure Unicode Normaliser

Unicode normalisers are used to transform Unicode strings into a consistent form so that equivalent characters can be reliably compared. Without normalisation, string matching may fail because the same character can be represented in multiple ways (e.g., a precomposed character vs. a base character + diacritic). Adversaries exploit this ambiguity to bypass security validation rules, such as input filters, authentication checks, or access controls.

Unicode provides four normalisation forms:

  • **Canonical** (NFC, NFD) — preserve character semantics but unify equivalent representations.
  • **Compatibility** (NFKC, NFKD) — apply broader transformations, potentially converting characters into visually similar but semantically different ones.

Improper use of compatibility normalisers can unintentionally change the meaning of data and enable spoofing attacks.

Remediation

  • Follow W3C recommendations: use NFC normalisation to enforce strong equivalence without converting characters into visually similar but different characters.
  • Normalise all inputs before applying validation or comparison logic, ensuring consistency across the entire application stack.
  • Avoid using compatibility normalisers (NFKC/NFKD) in security-sensitive contexts, as they may collapse distinct characters into unsafe equivalents.
  • Apply allow-lists or restricted character sets for identifiers (usernames, domains, resource names) to reduce the attack surface.
  • Learn more about Unicode normalisation issues in our https://learn.secdim.com/course/spotify-bigbird[Unsafe Normaliser] short course.

Metadata

  • Severity: medium
  • Slug: insecure-unicode-normaliser

CWEs

  • 179: Incorrect Behavior Order: Early Validation
  • 94: Improper Control of Generation of Code ('Code Injection')
  • 176: Improper Handling of Unicode Encoding
  • 180: Incorrect Behavior Order: Validate Before Canonicalize
  • 178: Improper Handling of Case Sensitivity
  • 1007: Insufficient Visual Distinction of Homoglyphs Presented to User

OWASP

  • A05:2021: Security Misconfiguration
  • A07:2021: Identification and Authentication Failures
Deco line
Deco line

Play AppSec WarGames

Want to skill-up in secure coding and AppSec? Try SecDim Wargames to learn how to find, hack and fix security vulnerabilities inspired by real-world incidents.

Deco line
Deco line

Got a comment?

Join our secure coding and AppSec community. A discussion board to share and discuss all aspects of secure programming, AppSec, DevSecOps, fuzzing, cloudsec, AIsec code review, and more.

Read more