IDN or Unicode string (one per line)

e.g., 스타벅스코리아.com

ASCII or PunyCode string (one per line)

e.g., xn--oy2b35ckwhba574atvuzkc.com

What is Punycode?

Punycode is a character encoding scheme that represents Unicode character strings using the limited ASCII character-set consisting of letters, digits, and hyphens, also known as Letter-Digit-Hyphen (LDH) subset.

Internationalized (Unicode) domain names allow for creation of domain names and TLDs in local languages, beyond the ASCII encoding. This promotes the use of local/native languages as domain names and helps to build the cultural and regional identity on the internet. However, due to protocol-specific restrictions, it is required that Unicode containing domain names first be translated in to conventional limited-ASCII character-set for seamless usage by browsers and other software systems. To achieve this, the Internationalized Domain Names in Applications (IDNA) mechanism was defined in 2003 for handling internationalized domain names containing non-ASCII characters.

The Punycode syntax for encoding Unicode containing strings is specified in the IETF RFC 3492.

As Unicode represents more than just the international character sets, Punycode can also be used to allow for hostnames that use emojis.

Note: For email addresses, Punycode is only used for internationalized email domains. If the local part (before the @ character) contains non-ASCII characters, it is encoded via UTF-8.

What’s in the name (Why Punycode)?

As per the RFC author, Adam Costello, the reason behind the name “Punycode” is as below:

It rhymes with Unicode and is intended to encode Unicode strings. It is “puny” in three senses: The repertoire of characters used in the encoded strings is small, the encoded strings are short, and the implementation is small.

How does Punycode work?

As stated in RFC 3492, "Punycode is an instance of a more general algorithm called Bootstring, which allows strings composed from a small set of 'basic' code points to uniquely represent any string of code points drawn from a larger set." Punycode defines parameters for the general Bootstring algorithm to match the characteristics of Unicode text.

The Punycode encoding is done by analyzing the string passed for non-ASCII characters. The algorithm then goes through several steps to create a string that is usable on ASCII systems.

Firstly, all characters are normalized by converting them into lowercase where applicable. Then, the characters are searched for ASCII compatibility. Any characters found that exist within the ASCII character set are ignored; however, non-standard ASCII characters are removed from within the text and a hyphen is placed at the end of the string.

If non-standard characters are found, the prefix 'xn--' is added to the string. This signifies that the string contains ASCII Compatible Encoding (ACE) and that the hyphen appended should be interpreted using Punycode instead of as part of the string itself.

Punycode then analyses the non-ASCII characters and appends a string of characters to the hyphen that uses ASCII characters to dictate which characters should be represented and where they should be placed within the string. It does this while ensuring that the result does not exceed the 63-character limit.

To prevent hyphens in non-international domain names from triggering a Punycode decoding, the string xn-- is prepended to Punycode sequences in internationalized domain names. This is called ASCII Compatible Encoding (ACE). Thus the domain name "스타벅스코리아.com" would be represented in a URL as "xn--oy2b35ckwhba574atvuzkc.com".

Punycode is designed to work across all scripts, and to be self-optimizing by attempting to adapt to the character set ranges within the string as it operates. It is optimized for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalized using nameprep and (for top-level domains) filtered against an officially registered language table before being punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.

How can Punycodeconverter.com help?

Punycodeconverter.com is a simple and easy-to-use tool to convert IDNs into Punycode and vice versa. You can use this tool to convert Unicode (UTF-8) into ASCII and convert ASCII to UTF-8.

When working with IDNs you need to convert them into an ASCII compatible encoding (ACE) before entering it into DNS servers. The online Punycode converter helps you to visually convert IDN into Punycode, its ACE equivalent.

How to use the Punycode Converter?

Our Unicode converter tool is simple to use, here is a step-by-step flow to help you:

  • Convert Unicode to Punycode

    • Step 1: Enter the Unicode domain name in the input field called “IDN or Unicode string” on the left. If you want to convert multiple Unicode domain names at once, simply enter one domain name per line.

    • Step 2: Click on the Convert to Punycode button.

    • Step 3: Check the results in “ASCII or Punycode string” field on the right side.

  • Convert Punycode to Unicode

    • Step 1: Enter the Punycode domain name in the input field called “ASCII or Punycode string” on the right. If you want to convert multiple Punycode domain names at once, simply enter one domain name per line.

    • Step 2: Click on the Convert to Unicode button.

    • Step 3: Check the results in “IDN or Unicode string” field on the left side.

Example Punycode encodings:

Unicode Punycode
스타벅스코리아.com xn--oy2b35ckwhba574atvuzkc.com
योगा.भारत xn--31b1c3b9b.xn--h2brj9c

A word of caution:

Unicode Domain Phishing Attack (also known as homographic phishing) is an exploitation technique that works by setting up a phishing website using an Internationalized domain name with Unicode characters looking like a legitimate domain name.

For example, аррӏе.com and apple.com are not all the same, the first one is a demonstration website created to show a security flaw for how the browser and users can be tricked into clicking on a legit looking web-URL which just uses similar-looking Unicode character as the legit website.

If you copy and paste аррӏе.com in the IDN/Unicode field above and hit convert to Punycode, you would be surprised to see that it would return xn--80ak6aa92e.com as the Punycode version, while apple.com will return as apple.com only because it does not contain any Unicode characters.

If you’re a Google Chrome user you do not need to worry as Google Chrome addressed this issue in its version 58.

Firefox users can prevent this from happening by typing “about:config” in the address bar, and searching for “Punycode.” If the value of the entry titled: network.IDN_show_puny_code is false, double-click it to change it to true. This way you can make the browser display the Punycode, and not the ASCII representation of it.

Firefox Punycode Configuration
Firefox Punycode Configuration

Punycode FAQs:

Punycode is an algorithm used to transform a Unicode containing hostname into an ASCII-Compatible Encoding string with only letters, digits, and hyphens.

Unicode is a text-encoding standard maintained by the Unicode Consortium. It is designed to support all the text in world’s digitized writing systems.

Not all software and protocols support non-ASCII character-set in hostnames and thus to maintain the working of Unicode domain names (IDNs), the conversion to Punycode is the first step of the process.

Internationalized Domain Names (IDNs) are domain names containing at least one Unicode character. An example of an IDN is: 스타벅스코리아.com or its ACE form: xn--oy2b35ckwhba574atvuzkc.com. IDNs make it easier to use local language names as domain names.

IDNs enable more web users to navigate the internet in their preferred script and more companies to maintain localization of their brand name in multiple scripts. IDNs improve the accessibility and functionality of the internet by enabling domain names in a wide variety of scripts from around the world. This also promotes development of regional languages and cultural identity on the internet.

ACE stands for ASCII Compatible Encoding. ACE prefix for Punycode version of Internationalized domain names is xn--

References: