Glyph Lab — Unicode Forensics

⟐ Character Inspector

Paste any text to inspect every character at the Unicode level. See code points, names, categories, and detect invisible or control characters instantly.

📥 Try These Examples

Click to load sample text with hidden Unicode characters:

Text with invisible zero-width characters 🕶️ 3 hidden chars

BiDi override attack (Trojan Source) ↔️ LTRAL + RLO

Homoglyph phishing URL 👥 Cyrillic substitutions

🔍 Invisible Character Scanner

Detect hidden Unicode characters that are invisible to the naked eye. Zero-width spaces, joiners, bidi controls, and other formatting characters that attackers use to hide data or manipulate text rendering.

📖 What Gets Detected

U+200B — Zero Width Space

U+200C — Zero Width Non-Joiner

U+200D — Zero Width Joiner

U+FEFF — Zero Width No-Break Space (BOM)

U+202A–202E — BiDi Embedding/Override

U+2066–2069 — BiDi Isolate Controls

U+00AD — Soft Hyphen

U+2060 — Word Joiner

🕶️ Zero-Width Steganography

Encode secret messages into invisible Unicode characters hidden within normal-looking text. The carrier text appears unchanged, but contains a hidden payload.

Secret message to hide:

Carrier text (the cover):

ℹ️ How It Works

Each character of your secret message is converted to binary (7 bits per char). Each bit is encoded as either U+200C (Zero Width Non-Joiner = 0) or U+200D (Zero Width Joiner = 1). These invisible characters are inserted after each visible character in the carrier text. The result looks identical to the original, but carries a hidden payload.

Decode reverses this: it extracts the zero-width characters, reconstructs the binary, and converts back to text.

👥 Homoglyph Explorer

Find characters from different scripts that look identical to Latin letters. Attackers use these in phishing URLs, fake filenames, and supply chain attacks.

Enter text to find lookalikes:

⚠️ Why This Matters

A URL like аррӏе.com looks identical to apple.com — but uses Cyrillic а (U+0430), р (U+0440), and ӏ (U+04CF). Your browser renders them identically, but they resolve to completely different servers.

This technique powers IDN homograph attacks — one of the oldest and most effective phishing methods in Unicode-based systems.

↔️ Bidirectional Text Attack Demo

Unicode bidirectional (BiDi) controls let text switch direction mid-string. Attackers abuse this to make code look like one thing while actually being something else — the "Trojan Source" vulnerability class.

Type or paste text with BiDi controls:

🔬 Trojan Source: Real-World Example

Here's how a BiDi attack can hide dangerous code. The RLO character (U+202E) reverses the visual order of everything that follows it:

// This looks like a safe check
if (user.isAdmin()) {
    U+202EgrantAccess("everyone"); // ← RLO makes this look like a comment!
}

// What you SEE:
// if (user.isAdmin()) {
//     grantAccess("everyone"); // <-- RLO reverses display
// }
// The semicolon and closing brace appear BEFORE the function call visually

// What it ACTUALLY executes:
// if (user.isAdmin()) {
//     ;grantAccess("everyone")
// }

In real attacks, the RLO character is invisible. The code appears to be a harmless comment, but actually grants access to everyone. This vulnerability (CVE-2021-42574) affected C/C++, Go, Rust, and many other languages.

🛡️ Defense Strategies

Strip BiDi controls from source code during CI/CD pipelines
Use editors that highlight or warn about invisible Unicode characters
Code review tools should flag files containing U+200E–U+200F, U+202A–U+202E, U+2066–U+2069
Normalize text to NFC/NFKC form before processing
Browser dev tools — inspect element to see the actual character sequence, not the rendered result

⟐ Glyph Lab — Unicode Forensics

Why this project?

What is it?

How does it work?

How to use it