UTF-8 Multibyte Characters in URLs = ✓

Here’s a fun one for y’all: did you know it’s possible to put multi-byte UTF-8 characters in URLs, and even use them as GET parameters?

Somebody posted a link to Skype’s new Theme Design contest for their Mac app, but I lost interest in the entries as soon as I saw the URL:

skype mac theme design competition

“Wait,” I said almost out loud, “was that a checkmark in the URL?” (Actually, it was out loud. My cat looked over at me, then went back to sleep, totally uninterested in all this crap.)

I cottoned on right away – of course it’s possible to put UTF-8 characters in the URL parameters. Why wouldn’t it be? In fact, people have probably been doing this for years!

Nonetheless, I had a little play around with it, and in case anybody’s interested, the code is below.

UTF-8 characters in URL parameters

I think it looks pretty spiffy.

I have no idea if this would work cross-browser (I’m on Chrome) or cross-platform (I’m on … you guessed it.) All I know is it’s probably a good idea to specify the encoding of your page, like so:

<meta charset="UTF-8">
<?php foreach ( $_GET as $key => $val )
	echo "<pre><strong>'{$key}:</strong> {$val}</pre>'; ?>

Might be a disaster, in the wild. But heck – try it out and see what happens – Skype did!

# Just a side-note: if you’re testing the values in PHP, it’s probably a good idea to do some encoding-related things beforehand, like decoding the parameters and comparison strings. I’m no PHP expert, but I know it can be a pain in the arse when it comes to encoding.

## Also interesting to note – this post’s URL is accessible at both /utf-8-multibyte-characters-in-url-parameters-✓/ and /utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/

6 Comments

Got something to say? Leave a Comment

  1. Max says:

    Thanks for the discovery. In order to use this symbols in your PHP script it seems like you gotta save your PHP files in “ANSI as UTF-8″ mode.

    Works like a charm:

    • Joss says: (Author)

      Great advice, thanks dude! I think any code you put in your comment may have been stripped out, though. Try running code through Postable before posting it and it should be cool.

  2. Max says:

    Pastebin is down, so yea.

    <?php if (in_array('%u262D', $_GET)) {
        echo '%u042FUSSIA, BODKA, COMMUHIZM!';
    } ?>
  3. Lucas says:

    Nice! Works on Firefox 5.0. I’m using Ubuntu 11.04.

  4. Chris Peckham says:

    I think support for UTF-8 (or, maybe, more generally, Unicode) in the URL is a step towards the long-awaited International Resource Identifiers

    http://www.w3.org/International/O-URL-and-ident.html

    I’m taking a punt on this in terms of developing and publishing SEO-friendly URLs for some French (and eventually Japanese) web pages. I hope it works :-)

Leave a Comment

Rule 1: There are no rules! Rule 2: Don't be a dick.

Pro tip: escape your input (<, > and &) and use <pre> tags if you want code to come out nicely. Some other HTML works too: <a>, <blockquote>, <cite>, <code>, <em>, <strike>, <strong>, etc.