Insecure Unicode Normaliser

Unicode normalisers are used to transform Unicode strings into a consistent form so that equivalent characters can be reliably compared. Without normalisation, string matching may fail because the same character can be represented in multiple ways (e.g., a precomposed character vs. a base character + diacritic). Adversaries exploit this ambiguity to bypass security validation rules, such as input filters, authentication checks, or access controls.

Unicode provides four normalisation forms:

**Canonical** (NFC, NFD) — preserve character semantics but unify equivalent representations.
**Compatibility** (NFKC, NFKD) — apply broader transformations, potentially converting characters into visually similar but semantically different ones.

Improper use of compatibility normalisers can unintentionally change the meaning of data and enable spoofing attacks.

Remediation

Follow W3C recommendations: use NFC normalisation to enforce strong equivalence without converting characters into visually similar but different characters.
Normalise all inputs before applying validation or comparison logic, ensuring consistency across the entire application stack.
Avoid using compatibility normalisers (NFKC/NFKD) in security-sensitive contexts, as they may collapse distinct characters into unsafe equivalents.
Apply allow-lists or restricted character sets for identifiers (usernames, domains, resource names) to reduce the attack surface.
Learn more about Unicode normalisation issues in our https://learn.secdim.com/course/spotify-bigbird[Unsafe Normaliser] short course.

Metadata

Severity: medium
Slug: insecure-unicode-normaliser

CWEs

179: Incorrect Behavior Order: Early Validation
94: Improper Control of Generation of Code ('Code Injection')
176: Improper Handling of Unicode Encoding
180: Incorrect Behavior Order: Validate Before Canonicalize
178: Improper Handling of Case Sensitivity
1007: Insufficient Visual Distinction of Homoglyphs Presented to User

OWASP

A05:2021: Security Misconfiguration
A07:2021: Identification and Authentication Failures

Available Labs

Select a language to explore available labs for this vulnerability.

1 Lab

Csharp Labs

Csharp 1 lab

Explore 1 lab in Csharp.

3 Labs

Go Labs

Go 3 labs

Explore 3 labs in Go.

2 Labs

Java Labs

Java 2 labs

Explore 2 labs in Java.

2 Labs

Javascript Labs

Javascript 2 labs

Explore 2 labs in Javascript.

2 Labs

Php Labs

Php 2 labs

Explore 2 labs in Php.

2 Labs

Python Labs

Python 2 labs

Explore 2 labs in Python.

2 Labs

Ruby Labs

Ruby 2 labs

Explore 2 labs in Ruby.

2 Labs

Typescript Labs

Typescript 2 labs

Explore 2 labs in Typescript.

No matching labs found

Try adjusting your language filter.

Play AppSec WarGames

Want to skill-up in secure coding and AppSec? Try SecDim Wargames to learn how to find, hack and fix security vulnerabilities inspired by real-world incidents.

For Companies Play Now

Got a comment?

Join our secure coding and AppSec community. A discussion board to share and discuss all aspects of secure programming, AppSec, DevSecOps, fuzzing, cloudsec, AIsec code review, and more.