Emoji Variation Selector Steganography

Hide secret messages within emoji using Unicode variation selectors - based on Paul Butler's research

Unicode Variation Selector Technique

This method uses invisible Unicode variation selectors (U+FE00 to U+FE0F) to encode data after emoji characters. Each variation selector encodes 4 bits of data, making this technique both compact and invisible to human readers.

Quick Examples

Click to load example combinations:

Select Base Emoji:

Click an emoji above to select
0 characters
Result will appear here
0 characters
Technical Note: Each variation selector encodes 4 bits of data. The message "hello" (5 characters = 40 bits) requires 10 variation selectors. These are invisible Unicode characters that modify the presentation of the preceding emoji without changing its appearance to human readers.
0 characters
Paste emoji above to preview
0 characters
0 characters
Paste emoji above to preview

How Emoji Variation Selector Steganography Works

Research Background

This technique is based on Paul Butler's research into Unicode steganography, published in his article "Steganography in Emoji" (2020). It exploits Unicode variation selectors - invisible characters originally designed to modify the presentation of other characters - to embed arbitrary data within emoji. These variation selectors are preserved across most platforms while remaining completely invisible to users.

What Are Variation Selectors?

Variation selectors are a set of Unicode characters (U+FE00 through U+FE0F) originally intended to specify different visual presentations of the same character. For example, some characters can be displayed in text style or emoji style, and variation selectors control this. However, when attached to emoji, these selectors typically have no visible effect while still being preserved in the underlying data.

This tool uses all 16 variation selectors (U+FE00 to U+FE0F), mapping each to a unique 4-bit binary value (0000 to 1111). This allows efficient data encoding.

Encoding Process - Step by Step

Step 1: UTF-8 Encoding

Your message is first converted to UTF-8 bytes. Each character becomes one or more bytes depending on its Unicode codepoint. Example: "hello" becomes 5 bytes: [68, 65, 6C, 6C, 6F].

Step 2: Byte to Nibble Conversion

Each byte is split into two 4-bit chunks called "nibbles". A byte like 0x68 (104 in decimal, 01101000 in binary) becomes two nibbles: 0110 (high) and 1000 (low). This doubling of data units is necessary because each variation selector encodes 4 bits.

Step 3: Nibble to Selector Mapping

Each 4-bit nibble maps to one of the 16 variation selectors:
- 0000 → U+FE00
- 0001 → U+FE01
- ...
- 1111 → U+FE0F

Step 4: Selector Appending

All variation selectors are appended to your chosen base emoji in sequence. The result looks like a single emoji but contains your entire hidden message.

Complete Example:

"hello" (5 chars, 5 UTF-8 bytes, 10 nibbles) → 10 variation selectors appended to base emoji
Visual result: 😊 (appears as single emoji)
Actual data: 😊[U+FE00][U+FE01]...[U+FE0F] (base + 10 invisible selectors)

Decoding Process

Decoding reverses the encoding:

  1. Extract Variation Selectors: The decoder scans through the emoji text and extracts all variation selector characters (U+FE00 to U+FE0F).
  2. Convert to Nibbles: Each variation selector is mapped back to its 4-bit value based on its Unicode codepoint.
  3. Nibbles to Bytes: Pairs of nibbles are combined to reconstruct the original bytes. Each pair of 4-bit values creates one 8-bit byte.
  4. Bytes to Text: The byte sequence is interpreted as UTF-8 and converted back to readable text.

If there's an odd number of nibbles (malformed data), the last nibble is discarded as padding.

Capacity and Efficiency

The encoding is quite efficient:

  • Each variation selector encodes 4 bits of data
  • Each ASCII character (1 byte) requires 2 selectors
  • Unicode characters requiring multiple bytes need proportionally more selectors
  • Example: "hello world" (11 ASCII chars) = 22 variation selectors
  • The base emoji itself takes 1-4 bytes depending on the emoji

There's no hard limit on message length - you can embed hundreds of characters in a single emoji, though extremely long messages may cause display issues on some platforms.

Advantages Over Other Methods

  • Completely Invisible: Variation selectors have no visual effect when used with emoji, making detection by human inspection impossible.
  • Platform Preservation: Most platforms preserve variation selectors during copy/paste, messaging, and storage operations.
  • Standards Compliant: Uses valid Unicode sequences that conform to standards.
  • Efficient Encoding: 4 bits per character is more efficient than methods using binary representations.
  • Tool Blind Spots: Research shows 78% of IDEs don't render variation selectors, and 92% of code review tools ignore them.